|
Is it at all reasonable to get this low of a test error rate? I doubt it, because the test error rates listed on the MNIST database website are higher than this, but the evidence otherwise seems irrefutable. I am using a 2 layer net with 100 hidden units. I am using no regularization. I initialize with small random weights and use the Nesterov momentum method on mini-batches. The training error rate is 0.0000667 (4 training examples misclassified). The average cross entropy loss on the training examples is 0.00410, and the same on the test examples is 0.207. |
|
MNIST is a scenarios where 2% is not too low. My naive network of one hidden layer with 200 nodes already achieves 1.7% test error. If you look at this page: http://yann.lecun.com/exdb/mnist/, you'll see that all the glorious efforts in ML research is just to get the test error down to about 0.3%. |
|
I believe it is too low ( though haven't done it myself). the benchmark on the MNIST site you linked to would be: 2-layer NN, 300 hidden units, mean square error none 4.7. [ since I believe you are not adding extra training data by rotating and otherwise distorting the original training data] I think one can assume that Yann tried eg 100,200 units and 300 gave the best... My guess is that you are not using the exact same training and test data as yann le cun links to http://yann.lecun.com/exdb/mnist/ in particular I am guessing that your training and test data' writers are overlapping |
Yes, I have tried with 128 hidden units, can easily get an error about 2.3%. My guess is if you increase the number of hidden units, the nn will overfit the data more easily.