|
I'm trying to implement information gain ratio to find how much a variable affects contributes to class membership in a naive bayesian classifier. I hope to use this for both weighting and to find which variables I can (safely) ignore. However, I have two different definitions of information gain. One on the information gain ratio wikipedia page, and another on the Kullback-Leibler divergence (aka information gain) page. Assuming I have my distributions for each class (In my case needle and haystack), what's the correct way to implement IGR? Is there better material than wikipedia available? Google could not find it for me. |
|
Information gain (IG) and information gain ratio (GR) are two different, but related, functions. Here is the paper that introduces GR; if you read through it you'll understand the motivation and be able to decide for your setting which of IG or GR suits your needs: Quinlan. Induction of Decision Trees. Machine Learning 1(1):81--106, 1986. IG and GR are also described in standard machine learning text books that cover decision tree learning (e.g., Mitchell's "Machine Learning"). There is an alternative way to check for split information that I suggest you also check out, although it is slightly harder to compute. It should be better behaved for situations where classes are extremely imbalanced: Art, Thanks for the answer and the papers. Indeed, my data is very imbalanced. My target class is in the single digit percentages with quite noisy features. (It is however very amenable to enrichment via external data)
(Feb 08 '13 at 19:36)
Steven
|