I have developed an algorithm for text classification (TC). It does not require any labeled dataset. I want to compare my algorithm with other algorithms. My question is which TC algorithms I should select for comparison. Currently I am comparing my algorithm with Kamal Nigam et al. “Text Classification from Labeled and Unlabeled Documents using EM”. In: Machine Learning - Special issue on information retrieval 39.2-3 (2000).

Please let me know your thoughts.

asked Nov 19 '12 at 03:40

swapnil%20hingmire's gravatar image

swapnil hingmire
16223


One Answer:

You should probably also compare it with completely unsupervised approaches, e.g. clustering.

My recommendation is to look at the RCV1 corpus, which is a standard benchmark data set for text classification. See which papers in Google Scholar most recently cite Forman's JMLR paper about RCV1, to get the state of the art.

The Nigal et al work is a good historical benchmark.

answered Nov 19 '12 at 18:17

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.