0
1

Does anyone have experience training ANNs on very sparse inputs and outputs?

The inputs are tf-idf word vectors and the outputs are summary words for the input and very sparse as well (less than 10 active in 4000).

All that happens is the network converges to predictions corresponding to the average frequency of the outputs. I've tried a whole range of hyperparameters: std of weight initialization, number of hidden units, learning rate, momentum, number of hidden layers, activation functions, dropout, minibatch size. I'm using SGD on anywhere from 100 to 1000 examples.

I've also tried better conditioning the problem (converting from sparse to dense and doing standard scaling) or reducing the amount of input and output classes

It's probably not an input or output representation issue as linear models (ridge/logistic regression) on the same input/output data train fine. The NN code also works fine on other datasets like MNIST.

I've heard presenting all classes with the same frequency can help even if it causes overfitting but it's not easily done in this situation as it's multi-label and many of the rare classes don't have more than a handful of examples.

Any hints, hunches, or suggestions are much appreciated! Thanks!

asked Sep 09 '13 at 12:59

Newmu's gravatar image

Newmu
29641014

edited Sep 09 '13 at 13:01

1

You need to be more precise in describing exactly what you are trying to do for me to be able to offer any advice. What is the training criterion of the neural net? What is the problem you are trying to solve? I don't understand your description of the input and output exactly.

(Sep 09 '13 at 17:15) gdahl ♦

Sorry about that, the input is a body of text and the desired output are a few summary words for that text or the main topics it's about. Right now, I'm just treating those summary words as classes to predict a distribution over.

(Sep 10 '13 at 11:21) Newmu

It appears to be related to the multi-label problem. If I train on each class individually with one-vs-all it learns perfectly fine on the sparse inputs.

(Sep 11 '13 at 16:03) Newmu
Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.