|
Typically in grammar, a clause is comprised of a subject and a predicate that, together, become the smallest grammatical unit that can express a complete proposition. I'm trying to find resources that explain how one might parse natural language text and tag clusters of words as clauses. Once tagged, I plan to analyze the sentiment for each clause in a sentence to facilitate calculating the overall sentiment of a passage of text. The motivation for tagging each clause is to avoid the problem of multiple sentiments occurring within the same sentence; this is something many papers dismiss as a rare occurrence, but I suspect it occurs at a surprisingly high rate. Moreover, emoticons and domain-specific words/symbols can quickly become untenable under a supervised learning paradigm. Right now, I'm aware of an algorithm that performs Bayesian inference over PCFGs by using MCMC algorithms to sample from the posterior distribution of parse trees given a Dirichlet prior. However, it's not language independent; I'm only aware of one such semantic parser, which assumes highly ambiguous supervision. My questions are:
While language independence would be nice, it's probably unnecessary. SVM using tree kernels could be an alternative approach to clause identification under the assumption that there exist enough data to train a semantic parser in most languages. Any thoughts or insight would be greatly appreciated! |
Re. 2 -- "the beds were very comfortable but the food was terrible." That's not a neutral statement. That's two statements with strong sentiments. So if you can get clause tagging working, why not forget about using sentences as your unit of enquiry, and use clauses instead. (Personally I think language-independent clause separation sounds unlikely, but I'm a bit out of touch)