|
I'm trying to create a binary classifier to categorize sentences into known classes, using deep learning. This paper here explains pretty clearly what I'm trying to do : http://aclweb.org/anthology//U/U06/U06-1005.pdf Except that I'd like to explore a deep learning architecture with unsupervised pre-training instead of using the more traditional machine learning approach as done in the paper. I've successfully created feature vector word representations from using an open source tool on my unlabeled corpus, and thusly have a matrix of feature vectors for all words in my unlabeled corpus. I also have a labelled training set of sentences which I'd like to train the neural network with. I've been studying up on the general architecture of deep learning for NLP. However, some things are still unclear to me. I'd now like to do supervised training on a neural network with these distributed word representations. The classic method is to use a window approach of fixed size around each word and apply a softmax classification layer for the output, and this seems to be the standard approach for tasks such as POS tagging. How would I adapt such an approach to sentence classification? Or is this the wrong approach to take altogether, and if so advise a new approach? Thanks, Victor |
|
Are strictly bound to the window approach? If not, how about using a model that "fuses" your individual word representations into a representation of the whole sentence? Recently, Recursive Neural Networks have gained some momentum and they should be able to provide you with sentence representations (even in an unsupervised way, see recursive auto-encoder). You could then use those representations as input to a classifier or even simply plug a classifier neural network layer on top of the Recursive Neural Network and fine-tune the representations. Most of the work on Recursive Neural Networks has been done by Socher et al., see socher.org for further reference. A nice point to start and quickly read up on the topic is a write-up by Charles Elkan: http://cseweb.ucsd.edu/~elkan/250B/learningmeaning.pdf Hi, I'm not strictly bound to the window approach. In fact, I'd be interested in hearing about all deep learning techniques relevant to the sentence classification task at hand. I actually didn't quite understand how the window approach would be used in this particular task, and that's why I posed the question. My understanding is that there is an output per fixed window iteration, but I really needed a binary output at the sentence level and not per fixed window like in a POS tagging task. As you mentioned, it looks like constructing a sentence representation from the distributed word representations is the ideal path to follow, though I'm still uncertain how to accomplish this. I haven't looked into RNN too heavily so it seems like I should look into that next. If you have any other advice or pointers, I'd be very interested in reading them. Thank you for your help!
(Apr 01 '14 at 00:32)
Victor Suthichai
|