I've been somewhat successfully using OpenNLP for NER. Basically, I can get it to identify a little more than half of the articles I throw at it, but it's far from perfect.

The training sets from OpenNLP are not available AFAIK and I wouldn't have a clue what to do with them.

Is there a better tagger, or is there a dataset I could plug into OpenNLP to make it better? GATE is always mentioned but I wasn't able to break into it. Linpipe isn't free as far as I can tell, and Python NLTK is in Python which is too slow and wouldn't fit into my code base.

Any suggestions? Thanks!

asked Oct 27 '10 at 15:59

Max%20Lynch's gravatar image

Max Lynch
16113


2 Answers:

If you're considering implementing a tagger, then I think this is a good paper to read for approaches that will improve your results in a developer time efficient way:

Lev Ratinov and Dan Roth, CoNLL 2009. Design challenges and misconceptions in named entity recognition.

http://l2r.cs.uiuc.edu/~danr/Papers/RatinovRo09.pdf

answered Oct 27 '10 at 17:04

syllogism's gravatar image

syllogism
181139

edited Oct 27 '10 at 17:09

Thanks for the link!

(Oct 28 '10 at 16:49) Max Lynch

An implementation of this algorithm is freely available http://cogcomp.cs.illinois.edu/page/software_view/4 (as linked in @zaxtax's answer). Ratinov et al's NER tagger is one of the best for English.

(Nov 04 '10 at 03:22) Joseph Turian ♦♦

There are several Named Entity Recognition systems out there. Just off the top of my head.

Stanford NER

Illinois Named Entity Extractor

Balie Information Extractor

In addition, although Lingpipe may or may not be free. They have a great outline of what's out there on their website.

If you want better performance, you'll probably want to grab a dataset that is more domain-specific. Good Luck!

answered Oct 27 '10 at 16:35

zaxtax's gravatar image

zaxtax ♦
1051122545

edited Oct 27 '10 at 16:37

Thanks for the other links. Do you have any feedback on OpenNLP? I find it's not as often used as the other ones, so I'm not sure why I'm using it.

(Oct 28 '10 at 16:50) Max Lynch
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.