I want to classify description(for example product description) to category, but the problem there are many unrelated sentences. So I want to build two staged classifier. First will select relevant sentence and second will classify that sentence to category. I thinking about some expectation maximization as most common approach to handling latent variables. Can some one direct me to articles, or may be there are implementations in weka/other libraries which I don't know?

asked Feb 25 '11 at 06:06

yura's gravatar image

yura
771294049


3 Answers:

For the paper that Alexandre refers to, we did our own implementation of a HCRF, which I unfortunately am not allowed to share. Recently I have tried implementing the same model in Factorie, but I haven't got up to the same level of performance yet. Yessenalina et al. independently did a svm-based model similar to ours, where they extract sentences that should be used as evidence of the document label. I think one difference is that our model actually learns to predict each sentence with appropriate labels, while their model picks sentences that should be used as evidence. This is probably more suitable if you want to predict document labels well, rather than predicting sentence labels.

A cool thing with these models is that if you have some labels available at the sentence level, you can readily incorporate these as well. When using stochastic gradient ascent for estimation this is straightforward to implement.

If the implementation of the svm-based model don't fit your needs, I would start with trying out Factorie and if that doesn't fit your needs either, I would consider implementing a tree-structured CRF model, using hard estimation as described in our paper. That only requires you to compute a very simple gradient and inference is a trivial loop over document-sentence label pairs.

Another thing you could consider is a hierarchical Naïve Bayes model. That should be pretty straightforward to set up in Factorie, I think.

answered Feb 25 '11 at 07:19

Oscar%20T%C3%A4ckstr%C3%B6m's gravatar image

Oscar Täckström
1459102743

edited Feb 25 '11 at 07:41

1

Our SVM-sle approach does not provide any formal guarantees on the quality of the sentences extracted as evidence since our optimization objective directly optimizes for document-level accuracy. However, we find that the extracted sentences tend to be informative in practice. But our approach is not guaranteed to find all the informative sentences (and often does not in practice), so it should not be used as a sentence-level classifier.

In most latent variable models (including ours), the non-convexity of the learning objective implies that you'll also need a good initialization of the latent variables. This is especially true in the case of SVM-sle. Since our application was sentiment analysis, we found that it was quite effective to use off-the-shelf sentiment classifiers for initializing our latent variables.

(Mar 05 '11 at 23:19) Yisong Yue

Oscar Täckström (a user in this website) recently published a paper where he does essentially that using a conditional random field: Discovering fine-grained sentiment with latent variable structured prediction models.

answered Feb 25 '11 at 06:22

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1896744214334

edited Feb 25 '11 at 07:01

Oscar%20T%C3%A4ckstr%C3%B6m's gravatar image

Oscar Täckström
1459102743

You might also be interested in this paper about hidden conditional random fields for phone classification. As far as I know it is common to approximate the values of the latent variables for these types of models by using the M best MAP states.

answered Feb 25 '11 at 06:55

Philemon%20Brakel's gravatar image

Philemon Brakel
153092244

1

We found that using only the top-1 label assignment for estimation worked best. The learned distributions over latent variables still makes sense, even when you don't properly marginalize over them. Training is also faster and it is much easier to implement, since you don't need to do forward-backward.

(Feb 25 '11 at 07:22) Oscar Täckström
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.