For a summarization task, I need to reduce the length of a sentence. Sentences are of the form "Product X is a next generation learning management system and academic social network". I need to get sentence of the form "Product X is a learning management system and academic social network", removing words "next generation". An ideal reduced sentence should be free from any marketing words. I first tried removing all the adjectives from the sentence, it worked for some of the cases but removed required words in other cases. How should I approach this problem ?

In a related problem, is there a way to identify subjective adjectives.

asked Jul 06 '11 at 16:15

Saurabh%20Saxena's gravatar image

Saurabh Saxena
16446

edited Jul 07 '11 at 15:11


One Answer:

"marketing words" is subjective. By that definition, I'd remove also "social network".

Seriously though, I'd probably start by compiling a corpus of marketing text and a corpus of mostly non-marketing text, and look for phrases that occur a lot in the marketing texts and not in the regular text.

My other intuition (very raw) is that the expressions you are after usually appear before other adjectives. So I'd try looking for adjectives/expressions that appear a lot in the beginnings of relatively long sequences of adjectives (and not appear at the ends of such sequences).

answered Jul 07 '11 at 19:39

yoavg's gravatar image

yoavg
741122331

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.