|
Hi everyone, Is there a way to to find all the sub-sentences of a sentence that still are meaningful and contain at least one subject, verb, and a predicate/object? For example, if we have a sentence like "I am going to do a seminar on NLP at SXSW in Austin next month". We can extract the following meaningful sub-sentences from this sentence: "I am going to do a seminar", "I am going to do a seminar on NLP", "I am going to do a seminar on NLP at SXSW", "I am going to do a seminar at SXSW", "I am going to do a seminar in Austin", "I am going to do a seminar on NLP next month", etc. Please note that there is no deduced sentences here (e.g. "There will be a NLP seminar at SXSW next month". Although this is true, we don't need this as part of this problem.) . All generated sentences are strictly part of the given sentence. How can we approach solving this problem? I was thinking of creating annotated training data that has a set of legal sub-sentences for each sentence in the training data set. And then write some supervised learning algorithm(s) to generate a model. I am quite new to NLP and Machine Learning, so it would be great if you guys could suggest some ways to solve this problem. Thanks in advance! |
|
Parse the sentence. For each constituent in the parse, consider deleting that constituent (the entire parse subtree). You can train a classifier to see if the constituent can be deleted. I believe you could turk training data and get decent accuracy if the constituent has a head label and head word. Or, you could run an existing textual inference model, to see if the original sentence implies the reduced sentence. If getting turk data would be too expensive/whatever, maybe something like "(1) parse the sentence, (2) for each constituent, consider deleting it, (3) parse the sentence with all the words in the constituent removed, (4) if the reparsed sentence's tree is still the same as the tree with that thing deleted, you probably have something that makes sense" would work?
(Jan 26 '12 at 07:04)
Alexandre Passos ♦
Thanks Joseph and Alexandre. This solution might be a little too expensive since we are parsing a sentence multiple times. When there are too many sentences to parse, performance might be a big concern. In any case, I will try to do as you suggested. Thanks!
(Jan 31 '12 at 21:57)
Golam Kawsar
|
May be you can use some structure prediction technique.Some tasks like image segmentation in computer vision can be analogous to your task.
You can get this information from a parser and some extraction rules, I think, although you might want a semantic role labeler as well.