|
Now, I want to employ a sentence segmentation tool to pre-process my data. However, I find the famous opensource nlp tool OpenNLP doesn't work well for the task of sentence seg. What's the other options recommended, thanx! |
|
I do not know about the actual best, but when I needed a sentence splitter fair few years ago, this was the best freely available one I could find then: http://cogcomp.cs.illinois.edu/page/tools_view/2 It was not perfect but it was pretty good for my needs at the time. More recently I have come across these:
However, I have not had the chance to use either of them. I think GATE also has some sentence splitters, but I have no idea how good they are. |
|
(Note: I didn't actually try any of these) The splitter by Dan Gillick (splitta) mentioned in the answer by Daniel Mahler should be good, at least according to the paper describing it. However, as it is trained on WSJ+Brown, it might be overfitted and not the best for your domain. Another alternative is the unsupervised algorithm by Kiss and Strunk (2006) which is implemented in nltk's "punkt" model. This implementation comes with a pre-trained English module, but can also be trained on any other (un-annotated) text. |
I'm not sure on what exactly you want from a sentence segmentation tool (segment text into sentences? segment sentences into chunks?), but have you looked at the stanford NLP tools? http://nlp.stanford.edu/software/