Document representations are used as the input for your supervised classifier, in such document-level tasks as sentiment analysis, subjectivity detection, document categorization, language indentification etc.
However, a bag-of-words representation (perhaps with tf*idf weights) gives poor parameter estimates for terms that are rare in the supervised data. If words that are rare in the supervised corpus are more common in large, unsupervised corpus, how can we use the large unsupervised corpus to induce document representations to use as our document features for the supervised classifier?
What is a good unsupervised technique for inducing document representations?