|
Are there well known datasets where neural networks fail to converge well on during training when using backpropagation (with or without any second order methods)? And then furthermore fail to classify well on the test set? Basically, I'm looking for a well known database where neural networks do horribly on both training and test sets. |
|
Backpropagation is another name for (stochastic) gradient descent-based optimization, and gradient methods are known to be very robust and generally converge to a local minimum is the learning rate is set to a good value (so set an unreasonably high learning rate and watch almost every neural network trash and suffer when learning almost any dataset). Very deep neural networks, also tend to converge to very bad local minima if trained discriminatively with first-order gradient descent, but this can be circumvented by unsupervised pre-training and/or higher-order optimization methods. Another thing thatm ight interest you is that without proper complexity control neural networks can overfit badly on datasets with a significant amount of either feature or label noise. To see this pick almost any real-world dataset that is not very large and train a neural network directly minimizing squared/logistic loss on training data without early stopping or regularization, and see that as the training error goes to zero the test error gets really big. Another possibility is, as Justin suggested, picking irrelevant features or an unsoluble problem, like training a neural network to reverse RSA ecnryption or something like that (although probably in some of these unrealistic datasets you'll still get somewhat ok training error if you tune things long enough). But neural networks are usually reliable and gradient descent has very nice properties, so I don't think you'll ever see a natural dataset where there is actual lack of convergence. 1
Lecun mentioned in one of his papers that it was not well understood why gradient descent converges much better than it theoretically should. One possible answer he gave was that oversized networks tend to overdimensionalize the problem and the network tends to use these free parameters to "route around" local minima. So I guess one way to create poor convergence would be create smaller networks where gradient descent becomes stuck in a local minima. But the stochastic nature seems to give it the benefit of still getting out.
(Mar 25 '11 at 10:17)
crdrn
|
|
Neural Networks are universal approximators in a sense that they can fit any dataset arbitrarily well. There's an issue of back-propagation being stuck in suboptimal local minima, but as you increase the number of elements in hidden layer you get stuck less often. In the extreme example, with infinite number of hidden layers, optimization surface has unique local minimum. This is the idea behind "convex neural networks". Therefore, examples of neural network failures to fit training data have to be architecture specific. Classical example is the XOR dataset, take network with 2 input units, 0 hidden units, 1 output unit and the dataset
0 hidden units?
(Feb 08 '13 at 07:44)
larsmans
What he means is no hidden layer.
(Feb 08 '13 at 08:07)
Justin Bayer
|
|
I don't think such a database would make sense. Since you can make up trivial examples of where neural networks would have difficulties, e.g. predict the brand of a car given a 1024x768 image. Much more interesting are datasets which illustrate the "edge". Where learning algorithms are able to do something, but not everything. A tough dataset like that is CIFAR-100, for example. I think that there has to be "something" that indicates that a dataset is actually "learnable", that there is enough structure in it but that NNs somehow fail to find. I disagree on this point to some extent. Certainly trivial stuff like you pointed out makes no sense to catalog. It'd be nice to have a list of public-domain datasets people have used for real research (eg, how much serious research has been done on {1024x768 image -> car} classification with a published data-set). It'd be nice to further annotate the list by: papers that cite it, number of samples, how the data [is/may be] distributed (if applicable) based either on known fact or statistical analysis, whether some problems were found using it (noisy, heavy tails, too few samples, too many local minimums, corruption inherent in the data such as scratches on images, etc). (cont)
(Apr 04 '11 at 20:18)
Brian Vandenberg
From what I can tell, many authors just choose a set because it's used a lot. A great counter-example to this is James Martens' paper on HF learning. He specifically chose a set that was known to be difficult or intractable with a normal NN for the express purpose of showing that it solves more difficult problems than a FFNN can solve. Supposing you found a similarly great solution to a tough problem, wouldn't it be nice if all you had to do was look through a database of data-sets and select one that best shows-off your shiny new toy instead of just choosing from the pool of sets you've encountered while doing research?
(Apr 04 '11 at 20:18)
Brian Vandenberg
1
I agree that a list of "open problems" would be nice. Or to have a "journal of negative machine learning results". :) However, these would always be, imho, on the "edge". To state that something fricking hard (I like Alexandre's reverse RSA) has not been solved by NNs is no information. Speaking of Marten's paper: it was all about the architecture and the method, not the dataset. The question was "how can we learn deep networks without pretraining" and not "how can we build a deep autoencoder for MNIST".
(Apr 05 '11 at 04:27)
Justin Bayer
@Justin - re: martens paper, exactly what I tried to say. On the 'no information' part, I think that's debatable but this probably isn't the forum for the discussion.
(Apr 05 '11 at 14:37)
Brian Vandenberg
|
|
If a neural network does "terrible on a training set", (assuming you are using enough nodes, layers, proper learning rate, etc.). It's not really the neural networks fault because it can always provide you with a network that simply gives you a one to one match of the training input and training output. You are just using crappy training data ( garbage in, garbage out) and no algorithm will fix that. |
|
Alexandre and Justin and ukurokawa touched on this already but the main issue with neural networks is over training and sparse data. Specifically, if you have very sparse data then running the training repeatedly over the data is more likely to overfit then neural network and make it less applicable when presented with new data. However, if you have a lot of unique data, or data that covers all possible kinds of data that will be seen, then you can train a neural network on fewer iterations and therefore with lest overfitting. |
|
traditional multilayer perceptron on two spiral problem without additional information. |