I'm trying to evaluate the performance of Naive Bayes classifier in order to compare it to other supervised learning algorithms. I used some of the available datasets from UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/). Naive Bayes did not perform well on most of the datasets I've tested and I assumed that the reason is that the underlying features are not independent.. I want to verify my assumption by testing Naive Bayes classifier on an independent feature model. My problem is that I don't know where I can find such a dataset or how to generate one!! Can anyone point to a tool/method of generating fake classification dataset in which the underlying probability model is an independent feature model??

asked Nov 29 '13 at 22:58

Sam%20MS's gravatar image

Sam MS
1111

In practice, underlying features are always never independent given class, but most of the time NB still works well. The reason is that you care about correct prediction about class label, not correct posterior probability over class labels. So as long as dependency structure of features given label doesn't change dramatically from class to class, you will over/under-estimate the posterior for all classes in similar way, but won't hurt your class label prediction too badly.

(Nov 30 '13 at 17:07) yanshuaicao
Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.