I am little confused with the two algorithms, which are applicable for structured prediction problem. My confusion is that whether the two algorithms are separate or the structured perceptron is used for getting the weight for CRF. Thanks very much...

asked Jun 29 '12 at 23:44

Jun's gravatar image

Jun
16225

edited Jun 30 '12 at 04:24

zaxtax's gravatar image

zaxtax ♦
1051122545

1

CRF is the model. Structured perceptron is a way of learning weights for models like CRFs.

(Jun 30 '12 at 04:27) zaxtax ♦

One Answer:

I would say the CRF and the structured perceptron are two closely related energy-based models. Given an energy function -- for instance, sum_t [x_t W y_t + y_t-1 A y_t] for linear-chain models -- CRFs and structured perceptrons minimize different loss functions to learn the parameters W and A. In particular, CRFs define a Gibbs distribution based on the energy and minimize the negative conditional log-likelihood, whereas structured perceptrons minimize something called the perceptron loss, which roughly has the form E(x, y) - max_y' E(x, y').

For fully observed models, it is possible to train the parameters using the structured perceptron and plug these into a CRF (or vice versa), as MAP-prediction amounts to exactly the same minimization, viz. minimization of the energy function over y. In the case when the model contains latent variables (such as in hidden, hidden-unit, or latent-dynamic CRFs), the two models become different. The difference is that latent-variable CRFs marginalize out the latent variables z and then optimize over y to obtain the MAP-prediction, whereas structured perceptrons would do a joint optimization over z and y. Empirically, I have experiences that in the case of latent-variable models, you can not plugin structured-perceptron parameters into the corresponding CRF model or vice versa.

Yann LeCun has a nice tutorial on energy-based models that you could use to read up on this stuff. Some of the papers on CRFs with latent variables may also be of interest to you.

answered Jul 04 '12 at 03:31

Laurens%20van%20der%20Maaten's gravatar image

Laurens van der Maaten
100651324

Thanks very much. I will have a look at the tutorial paper. Your explanation is great. I did a tiny experiment using structured Perceptron to deal with named entity recognition. The result is: the 6301 iteration tp: 1832 guessEntities: 1832 trueEntitites: 1832 precision = 1.0 recall = 1.0 F1 = 1.0'

The result is amazing. In the previous time, I used MEMM to deal with this problem. The result is just 0.87 with the same feature.

(Jul 05 '12 at 13:34) Jun
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.