|
Sorry for what is an incredibly simple question, but I haven't been able to find a straight forward answer via searching. I want to be sure I haven't been not assuming the wrong thing. If I have a naive bayes classifier for documents how do I compute p(wi|C)? Is it simply the frequency of word wi in documents of class C over the frequency of all words in class C? |
|
Using the frequency of the word wi in documents of class C over the frequency of all words in class C is one way of estimating that probability. It has a problem, however, which is that words that never before occurred in a class will have probability 0, which will ensure that any document with a new word will never be classified into that class, regardless of the other words (since when you multiply the probabilities of the other words with 0 you will get 0). For this reason most people use other estimators, the simpler one being adding a small constant to the counts of all words in all classes before normalizing to compute probabilities, but you can also use fairly complex techniques such as good-turing estimation. |
|
What Alexandre says is correct. You may also use Laplace Smoothing, which basically tries to solve the problem Alexandre talks about. It pretty well explained in Andrew Ng's Lecture on naive Baiyes, with a cool example. is the Notes for lecture 2, and you can find the lecture in internet, which should be lecture 4, though I'm not really sure |