3
1

Which types of NER are the current state of the art and give the best results? Personally I have seen a nice library which was based on the following thesis: http://cogprints.org/5859/1/Thesis-David-Nadeau.pdf and http://balie.sourceforge.net/

It worked very well, at the time, but it is semi automated. I wonder if there are better techniques now.

asked Oct 27 '10 at 14:34

TomH's gravatar image

TomH
61234


2 Answers:

The English benchmark is the CoNLL03 shared task dataset (drawn from Reuters newswire).

The state-of-the-art is Lin, D., & Wu, X. (2009). Phrase clustering for discriminative learning. They achieve an F1 of 90.90, but they train over 700 billion words of web text using Google computing power.

The runner up is my work with Lev Ratinov: Word representations: A simple and general method for semi-supervised learning. We achieve 90.36 F1, and our NER system is freely available.

answered Oct 27 '10 at 15:03

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

2

Cool, under which license is it released? How about the training data?

(Oct 27 '10 at 15:19) TomH
1

Nice work, but getting the training data would make it more helpful.

(May 25 '11 at 22:51) Fábio

IMHO the current state-of-the-art in NER uses a) distributional similarity features (e.g. brown clusters) and b) non-local features (e.g. extended prediction history).

-> The work of Joseph and Lev Ratinov features both.

If you need an off-the-shelf toolkit (e.g. you don't have access to CoNLL03) you should also consider the Stanford Named Entity Recognizer which uses distributional similarity features and was trained on both CoNLL03 and MUC. The drawback is that it does Viterbi decoding so it's not as efficient as a simple greedy left-to-right decoder (as used by Turian & Ratinov)

answered May 27 '11 at 10:56

Peter%20Prettenhofer's gravatar image

Peter Prettenhofer
5251911

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.