Is there a way to train a text classifier with online learning when at the beginning there are not features, but these are added to the vocabulary (feature space) as new examples (with words that don't belong to the vocabulary) arrive?

asked Mar 07 '13 at 00:20

Eddie23r's gravatar image

Eddie23r
6669


One Answer:

Yes:

  1. In linear models (linear SVM, logistic regression, naive Bayes in log-space), zero coefficients do not affect the decision. So, when you see a batch of documents with new words, you can expand the vocabulary with the n "new" words in the batch, append n zeros to the coefficient matrix, and train. Similar tricks may apply to other models; in neural nets, append n random numbers to the input-to-hidden weights.
  2. Use the hashing trick.

answered Mar 07 '13 at 05:59

larsmans's gravatar image

larsmans
67651424

edited Mar 07 '13 at 10:21

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.