|
Hi, I've been running into these problems when a sentence is 'title cased', and weird, obviously wrong entities are identified. I've seen this with third parties like OpenCalais, and every toolkit I have tried. Here's a couple of examples of the problem:
This gets "robin williams on dustin" identified.
This gets "allen ginsberg says". This one happens quite frequently when the next word is capitalized after an entity. Any ideas? |
|
I can propse composition of 2 NER classifiers. One which recreate case information and second common NER classifier for entities. Train data for first can be easily obtained. |
Have you tried retraining the NER chunker with data that has this behavior (maybe randomly title-casing a subset of the training data)?