Hi, I've been running into these problems when a sentence is 'title cased', and weird, obviously wrong entities are identified. I've seen this with third parties like OpenCalais, and every toolkit I have tried. Here's a couple of examples of the problem:

  1. Robin Williams On Dustin Hoffman's TOOTSIE Performance. Robin Williams talks about Dustin Hoffman's performance in TOOTSIE as well as comedy vs. drama.

This gets "robin williams on dustin" identified.

  1. Allen Ginsberg Says National Guard deploys to border.

This gets "allen ginsberg says". This one happens quite frequently when the next word is capitalized after an entity.

Any ideas?

asked Nov 16 '10 at 12:50

AK%20Roston's gravatar image

AK Roston
31226

Have you tried retraining the NER chunker with data that has this behavior (maybe randomly title-casing a subset of the training data)?

(Nov 16 '10 at 13:22) Alexandre Passos ♦

One Answer:

I can propse composition of 2 NER classifiers. One which recreate case information and second common NER classifier for entities. Train data for first can be easily obtained.

answered Nov 22 '10 at 08:16

yura's gravatar image

yura
1025374854

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.