What are good open-source libraries for splitting an arbitrary sentences into subject, object, and verb parts? I'm not looking for any complicated grammar parsing, just something that could take the sentence:

"The monkey ate the banana because it was ripe."

and return:

("The monkey", "ate", "the banana because it was ripe.")

asked May 27 '11 at 13:14

Cerin's gravatar image

Cerin
402253744


3 Answers:

You will need a Dependency Parser. You can use MaltParser or the MSTParser. Certainly these will most often return a full parse tree, marking adverbs, indirect objects, subordinate clauses, etc. You can build something very simple to extract the SVO triple, which in you example will be "monkey_SBJ ate_ROOT banana_OBJ", where the root of the dependency tree is often the main verb. You will need to see how the parser encodes the subject and the object and what kinds of objects it discovers (direct, indirect, etc.). Then when it comes to constructions with auxiliary verbs ("has been eating") you need to check what is marked as ROOT and how you can extract the additional elements of the complex verb phrase, if needed. And finally, if needed, you may have to extract the children (or the dependents) of a head, i.e. all children of the subject head, all children of the object head, as you have shown in your example.

answered May 30 '11 at 03:05

Svetoslav%20Marinov's gravatar image

Svetoslav Marinov
23617

edited May 30 '11 at 04:19

You can build something similar pretty easily with a dependency parser. The stanford parser is pretty good at it.

answered May 27 '11 at 13:25

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1896744214334

The group at Lund University has a PropBank-style parser available here

answered May 27 '11 at 14:25

Bryan%20Rink's gravatar image

Bryan Rink
12

edited May 27 '11 at 14:26

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.