Does anyone have good examples of where part-of-speech features have significantly helped in text classification? Moreover, what's the best way to use them as a feature? I've been combining unigrams with the part-of-speech tag (e.g., "sick-JJ").

asked Dec 23 '11 at 22:22

Alec's gravatar image

Alec
131359


2 Answers:

Complex Linguistic Features for Text Classification: A Comprehensive Study Url: http://dx.doi.org/10.1007/978-3-540-24752-4_14

answered Dec 24 '11 at 13:33

Arash%20Joorabchi's gravatar image

Arash Joorabchi
313

The easiest thing to do is, as you said, adding the POS tag to bag-of-word features. This helps disambiguate between walk-verb and walk-noun, which might come in handy. Another easy thing to do (which I heard from Khalid El-Arini) is to take sequences of nouns (NN, NNP, NNPS, etc) and treating them as a single word. This would capture things like "new york city", "alexandre passos", etc. Finally, in some applications that need more syntactic features it might come in handy to use n-grams or skip-n-grams of pos tags as features (skip n-grams are like n-grams except you can skip over one word while making them; the past sentence has the following skip bigrams, for example: "skip n-grams", "skip are", "n-grams are", "n-grams like", etc). You can also replace some word classes (like adjectives or closed-class words) by their pos tags and build n-gram features out of that.

answered Dec 24 '11 at 04:18

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1899744214335

Your answer
toggle preview

Subscription:

Once you sign in you will be able to subscribe for any updates here

Tags:

×3

Asked: Dec 23 '11 at 22:22

Seen: 607 times

Last updated: Dec 24 '11 at 13:33

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.