I guess in many naive attempts at attacking NLP problems, lexical (ie, word identity) features are a common first choice. Recently, reading this paper by yoavg I became interested in the question of which problems are amenable to nonlexicalized features.

So which problems seem to need lexical information?

To start the conversation, apparently

  1. by that paper I cited, NER and chunking don't seem to need lexical features
  2. coreference doesn't need lexical information (at least not in the usual way; each entity wikll have its lexicon, but which words are there matter very little apart from closed-class words), as per the Haghighi and Klein papers

How about parsing? And POS tagging (of course some sort of distributional feature would be necessary here, but maybe morphological + co-occurrence can cover for word identity)? How far, in the domains where some sort of lexical information is needed, do morphological features take you? Also, how well can word embeddings replace word features in these applications? How well does this carry accross different languages?

Is there more research in capturing the sort of information that is used by the H&K coreference papers cited above (that is, word identity is important but not in a trivial way, and its importance only appears when the domain is restricted enough)?

asked Sep 02 '10 at 21:27

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1901244215335


One Answer:

To clarify, the main claim is not that "lexical features are not needed" but that "lexical features derived from annotated corpora do not really help". That is, we can do as well without lexical features derived from corpora as we do with them.

This is true also for constituency parsing (cf Berkeley parser), as well as for dependency parsing (cf Kamahara and Uchimoto 2007).

My intuition is that this will prove to be true to all supervised tasks which are not kinds of topic classification.

(for POS-tagging, the identity of the word being tagged definitely helps as it restricts the possible POS assignments for the word, but this kind of information can be found in a tagging dictionary, which is not necessarily derived from annotated corpora)

Annotated corpora are just too small for learning any meaningful semantic (meaning not-structural) information from them and expect it to generalize outside of the annotated corpora.

answered Sep 03 '10 at 12:27

yoavg's gravatar image

yoavg
69671825

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.