Hello,

I am trying to do text classification using Naive Bayes. Before training, I would like to make feature selection in order to reduce the feature space dimension. In order to do so, I have thought of using a method that weights 2 filters for scoring the features and then select the top K features.

For example, let's suppose that I have Information Gain as the first filter, and "X" as the second filter. I would like to find the best weights a, b as follows:

score(feature) = a * Infogain(feature) + b * X(feature)

I guess one possible option should be trying with some values and see if the performance gets improved, but there are any other less costly methods ?

(for example, I thought of classifying a feature as good or bad using svm: a manual annotator classifies some features as good or bad ("label of the feature"), and Infogain(feature), X(feature) are used as "features or the feature" )

Thanks in advance...

asked Apr 01 '12 at 09:37

dgqw01's gravatar image

dgqw01
1111

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.