Typically in grammar, a clause is comprised of a subject and a predicate that, together, become the smallest grammatical unit that can express a complete proposition. I'm trying to find resources that explain how one might parse natural language text and tag clusters of words as clauses. Once tagged, I plan to analyze the sentiment for each clause in a sentence to facilitate calculating the overall sentiment of a passage of text.

The motivation for tagging each clause is to avoid the problem of multiple sentiments occurring within the same sentence; this is something many papers dismiss as a rare occurrence, but I suspect it occurs at a surprisingly high rate. Moreover, emoticons and domain-specific words/symbols can quickly become untenable under a supervised learning paradigm.

Right now, I'm aware of an algorithm that performs Bayesian inference over PCFGs by using MCMC algorithms to sample from the posterior distribution of parse trees given a Dirichlet prior. However, it's not language independent; I'm only aware of one such semantic parser, which assumes highly ambiguous supervision.

My questions are:

  1. Is it necessary/proper to use a semantic parser for PCFG inference to identify clauses?
  2. Is my assumption correct that a sentence's overall sentiment can be thought of as some average of the sentiment mined from each of its constituent clauses?
  3. Assuming a closed-form analytic solution exists for the probability of a parse tree given the data/parameters, would you expect mean-field variational Bayes to perform better in terms of computational complexity and F-score?

While language independence would be nice, it's probably unnecessary. SVM using tree kernels could be an alternative approach to clause identification under the assumption that there exist enough data to train a semantic parser in most languages.

Any thoughts or insight would be greatly appreciated!

asked Apr 08 '13 at 03:19

kmore's gravatar image

kmore
26558

Re. 2 -- "the beds were very comfortable but the food was terrible." That's not a neutral statement. That's two statements with strong sentiments. So if you can get clause tagging working, why not forget about using sentences as your unit of enquiry, and use clauses instead. (Personally I think language-independent clause separation sounds unlikely, but I'm a bit out of touch)

(Apr 08 '13 at 09:05) Andrew Clegg
Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.