|
What are good open-source libraries for splitting an arbitrary sentences into subject, object, and verb parts? I'm not looking for any complicated grammar parsing, just something that could take the sentence: "The monkey ate the banana because it was ripe." and return: ("The monkey", "ate", "the banana because it was ripe.") |
|
You will need a Dependency Parser. You can use MaltParser or the MSTParser. Certainly these will most often return a full parse tree, marking adverbs, indirect objects, subordinate clauses, etc. You can build something very simple to extract the SVO triple, which in you example will be "monkey_SBJ ate_ROOT banana_OBJ", where the root of the dependency tree is often the main verb. You will need to see how the parser encodes the subject and the object and what kinds of objects it discovers (direct, indirect, etc.). Then when it comes to constructions with auxiliary verbs ("has been eating") you need to check what is marked as ROOT and how you can extract the additional elements of the complex verb phrase, if needed. And finally, if needed, you may have to extract the children (or the dependents) of a head, i.e. all children of the subject head, all children of the object head, as you have shown in your example. |
|
You can build something similar pretty easily with a dependency parser. The stanford parser is pretty good at it. |