I was wondering if anyone could point me to, or could list the general classes of feature representations for text. I'm having a hard time finding a good listing anywhere (and tutorials almost all stick to bag-of-words).

asked Dec 12 '11 at 17:35

sbirch's gravatar image


2 Answers:

The wordreps on the metaoptimize page, link, might be a place to start. There's also links to papers providing details.

answered Dec 12 '11 at 19:25

alto's gravatar image


It is important to recognise why you are working on text. Is it topic clustering, sentiment analysis, authorship attribution, language analysis or any other method?

Once you work out that, google for those phrases, rather than "feature representations".

To give a specific answer, feature types are generally in four categories. Syntactic features (including POS tagging), structural features (i.e. sentence length, number of paragraphs), lexical features (including character n-grams) and content specific features (i.e. email character encoding, which doesn't make sense in most other contexts). Which ones you use, and how they are used, depends on the application you are doing.

answered Dec 19 '11 at 19:52

Robert%20Layton's gravatar image

Robert Layton

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.