|
I've been somewhat successfully using OpenNLP for NER. Basically, I can get it to identify a little more than half of the articles I throw at it, but it's far from perfect. The training sets from OpenNLP are not available AFAIK and I wouldn't have a clue what to do with them. Is there a better tagger, or is there a dataset I could plug into OpenNLP to make it better? GATE is always mentioned but I wasn't able to break into it. Linpipe isn't free as far as I can tell, and Python NLTK is in Python which is too slow and wouldn't fit into my code base. Any suggestions? Thanks! |
|
If you're considering implementing a tagger, then I think this is a good paper to read for approaches that will improve your results in a developer time efficient way: Lev Ratinov and Dan Roth, CoNLL 2009. Design challenges and misconceptions in named entity recognition. Thanks for the link!
(Oct 28 '10 at 16:49)
Max Lynch
An implementation of this algorithm is freely available http://cogcomp.cs.illinois.edu/page/software_view/4 (as linked in @zaxtax's answer). Ratinov et al's NER tagger is one of the best for English.
(Nov 04 '10 at 03:22)
Joseph Turian ♦♦
|
|
There are several Named Entity Recognition systems out there. Just off the top of my head. Illinois Named Entity Extractor In addition, although Lingpipe may or may not be free. They have a great outline of what's out there on their website. If you want better performance, you'll probably want to grab a dataset that is more domain-specific. Good Luck! Thanks for the other links. Do you have any feedback on OpenNLP? I find it's not as often used as the other ones, so I'm not sure why I'm using it.
(Oct 28 '10 at 16:50)
Max Lynch
|