I've heard the claim before that an unsupervised POS tagger is sometimes more useful (as component in some end-to-end task) than a supervised one. (Even if the supervised one has state-of-the-art performance.)

I think that may very well be true (even more so, probably, when the unsupervised tags are modeled as latent variables in the objective of the end-to-end task, so everything is trained jointly).

In any case, is there some paper that shows empirically that the claim holds?

asked Feb 16 '12 at 16:28

Frank's gravatar image

Frank
1349274453

edited Feb 18 '12 at 06:23

ogrisel's gravatar image

ogrisel
498995591


2 Answers:

In general, many of the papers using word classes as features for an end-task could be taken to support this (e.g. Miller et al. 2004, Ratinov and Roth 2009, Turian et al. 2010, Koo et al. 2008, etc), although typically they use unsupervised word classes in addition to supervised POS tags, rather then instead.

We tried to explicitly compare unsupervised word classes to gold POS tags on several task in the contexts of cognitive modeling in Chrupala and Alishahi (2010): perhaps the most straightforward comparison was the word prediction task: guess a missing word based on its class or POS tag.

answered Feb 17 '12 at 16:22

Grzegorz's gravatar image

Grzegorz
313

Petrov 2009 presents a way of introducing latent variables to the parsing/grammar induction problem that you could see as semi-supervised. The basic idea is that there can be many different latent rules that are observed as a context free grammar rule like S->NP VP. Your question is a bit vague, but I think this might be of interest to you. I don't know of the equivalent work in POS tagging, but I assume something like it exists.

answered Feb 18 '12 at 21:04

Travis%20Wolfe's gravatar image

Travis Wolfe
235119

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.