0
1

I am doing bag-of-words text classification (text categorization) with very few labeled examples. As you can imagine, I have high-dimensional sparse feature vectors. Very few labeled examples means under 100, sometimes only 10 or 20.

As a baseline, I ran l2-regularized logistic regression. I choose hyperparameters (l2 parameter, learning rate, # of training passes) using leave-one-out cross validation. In particular, for each leave-on-out set, I compute the logistic loss of the trained classifier on the left-out example. I choose the hyperparameters that minimize the total logistic loss across all leave-one-out sets.

The problem is that I am overfitting the rare features. I end up with a low regularization parameter, and the rare features (words that appear in one or two examples) have high weights.

How do I avoid overfitting the rare features when learning a classifier over very few labeled examples? Should I use a different model? Should I use a different cross-validation technique? What approach will give the best generalization when doing supervised classification over few high-dimensional labeled examples?

asked Apr 20 '11 at 22:58

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.