Sorry for what is an incredibly simple question, but I haven't been able to find a straight forward answer via searching. I want to be sure I haven't been not assuming the wrong thing.

If I have a naive bayes classifier for documents how do I compute p(wi|C)? Is it simply the frequency of word wi in documents of class C over the frequency of all words in class C?

asked Mar 12 '11 at 13:59

alto's gravatar image

alto
2652614


2 Answers:

Using the frequency of the word wi in documents of class C over the frequency of all words in class C is one way of estimating that probability. It has a problem, however, which is that words that never before occurred in a class will have probability 0, which will ensure that any document with a new word will never be classified into that class, regardless of the other words (since when you multiply the probabilities of the other words with 0 you will get 0). For this reason most people use other estimators, the simpler one being adding a small constant to the counts of all words in all classes before normalizing to compute probabilities, but you can also use fairly complex techniques such as good-turing estimation.

answered Mar 12 '11 at 16:14

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1896744214334

What Alexandre says is correct.

You may also use Laplace Smoothing, which basically tries to solve the problem Alexandre talks about.

It pretty well explained in Andrew Ng's Lecture on naive Baiyes, with a cool example.

Here are the notes

is the Notes for lecture 2, and you can find the lecture in internet, which should be lecture 4, though I'm not really sure

answered Mar 12 '11 at 21:12

Leon%20Palafox's gravatar image

Leon Palafox
31265471107

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.