I'm interested in using MALLET to conduct Part of Speech tagging or sentiment analysis. After reviewing the MALLET API documentation, it's pretty clear that the tagger accepts binary values when describing features.

But, all of the examples that I've been able to dig up for using MALLET only use binary labels along with binary features.

I wanted to get some clarification prior to mucking around with MALLET.

MALLET SimpleTaggger API documentation.

MALLET quick start on the SimpleTagger tool.

NLTK has a wrapper built on top of MALLET, a quick source code review didn't make it really clear that MALLET supports multinomial labels.

NLTK-MALLET API documentation.

Thanks in advance! ct

asked Sep 04 '12 at 22:16

ctaylor's gravatar image

ctaylor
1112

edited Sep 04 '12 at 22:26

Keith%20Stevens's gravatar image

Keith Stevens
62161327


One Answer:

The SimpleTagger handles multiple classes. Here's a little test input and output you can use:

For training: train.tag.txt

BILL CAP noun
slept LOWER verb
on LOWER STOPWORD prep
the LOWER STOPWORD det
couch LOWER noun

For testing: test.tag.txt

CAP BILL
    slept
    on
    the
    couch

With these two files, you can then train and tag your text:

$ java -cp lib/mallet-deps.jar:class/ cc.mallet.fst.SimpleTagger  --train true --model-file tag.model train.tag.txt
$ java -cp lib/mallet-deps.jar:class/ cc.mallet.fst.SimpleTagger --model-file tag.model test.tag.txt

And get the output:

    Number of predicates: 9
    noun 
    verb 
    prep 
    det 
    noun

answered Sep 04 '12 at 22:36

Keith%20Stevens's gravatar image

Keith Stevens
62161327

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.