Revision history[back]
click to hide/show revision 1
Revision n. 1

May 20 '10 at 16:30

Joseph%20Turian's gravatar image

Joseph Turian
577551125146

What is a good unsupervised technique for inducing document representations?

Document representations are used as the input for your supervised classifier, in such document-level tasks as sentiment analysis, subjectivity detection, document categorization, language indentification etc.

However, a bag-of-words representation (perhaps with tf*idf weights) gives poor parameter estimates for terms that are rare in the supervised data. If words that are rare in the supervised corpus are more common in large, unsupervised corpus, how can we use the large unsupervised corpus to induce document representations to use as our document features for the supervised classifier?

What is a good unsupervised technique for inducing document representations?

click to hide/show revision 2
Revision n. 2

May 21 '10 at 11:05

Joseph%20Turian's gravatar image

Joseph Turian
577551125146

What is a good Online, unsupervised technique for inducing document representations?

Document representations are used as the input for your supervised classifier, in such document-level tasks as sentiment analysis, subjectivity detection, document categorization, language indentification etc.

However, a bag-of-words representation (perhaps with tf*idf weights) gives poor parameter estimates for terms that are rare in the supervised data. If words that are rare in the supervised corpus are more common in large, unsupervised corpus, how can we use the large unsupervised corpus to induce document representations to use as our document features for the supervised classifier?classifier? Moreover, I would like an online training algorithm.

What is a good an online, unsupervised technique for inducing document representations?

click to hide/show revision 3
Revision n. 3

Oct 09 '10 at 11:50

Joseph%20Turian's gravatar image

Joseph Turian
577551125146

Online, unsupervised technique for inducing document representations?

Document representations are used as the input for your supervised classifier, in such document-level tasks as sentiment analysis, subjectivity detection, document categorization, language indentification etc.

However, a bag-of-words representation (perhaps with tf*idf weights) gives poor parameter estimates for terms that are rare in the supervised data. If words that are rare in the supervised corpus are more common in large, unsupervised corpus, how can we use the large unsupervised corpus to induce document representations to use as our document features for the supervised classifier? Moreover, I would like an online training algorithm.

What is an online, unsupervised technique for inducing document representations?

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.