|
Naive Bayes independence assumption makes its predictions very poorly calibrated. OTOH, the assumption makes NB very simple. If you are willing to lose some of the simplicity, how can you get better calibration from it? |
|
You can view Naive Bayes as a particular graphical structure (class variable connecting to attribute variables) where parameters are trained generatively. You could instead
Edit: another option I just remembered -- you can merge attributes together into larger attributes. As a limiting example, suppose you have n binary attributes. You could instead view this as a single super-attribute with 2^n different values. There's no longer any independence assumption and probabilities will be well calibrated. However, the amount of data needed to achieve reliable estimates will be much larger. As an intermediate step, you could try to find subsets of attributes to merge, so that resulting super-attributes are not as correlated with each other. You can always use a smoothing to account for unseen features, but perhaps it's too simple a solution.
(Sep 27 '11 at 01:17)
Leon Palafox
|
If you really want calibration only, and you think that naive bayes for your data is already performing well enough you can try to fit a logistic regression model using only a bias feature and a feature for the predicted naive-bayes log-score. This will distribute the probabilities more accurately, I think, although the classification boundary will not change. You should follow Yaroslav's advice, more probably.