|
I'm a beginner with theano and my problem is that my mini-batches are too big. I'm using a convolutional neural network similar to the deeplearning tutorial. The images I'm using to train are so big that I can only use a mini-batch size = 2. I really want to experiment with bigger sizes, so I'm trying to come up with code that only updates the weights after, say, 100 samples (so 50 mini-batch iterations). The problem is that if I update after the 50th mini-batch, only the last mini-batch will affect the updates.
|
|
I don't think there is a good reason to do this... The primary motivation to use mini-batches is so that the computation can make efficient use of the GPU (or multicore) architecture. If you are computing the gradients in batches of two because of memory constraints, you might as well update the model parameters every two examples as well. That being said, if you really want to only update the model parameters every 100 examples, you could keep a running average (or sum) of the gradients computed until you reach 100 examples and then apply that averaged gradient as an update. I think that maintaining the average is just as much computational overhead (and even more memory overhead) as updating the model parameters every batch though. Again though, I think this is a bad idea compared to just updating the model after every batch. 1
Thank you for your anwser. This has solved my problem indeed. I do think there is a good reason to do this however. The mini-batch size can sometimes greatly influence the training speed and even the final accuracy.
(Nov 20 '13 at 12:42)
jolix
Glad that worked... You might have been having hyperparameter issues with the small mini-batch size though... if you are using a batch size of 2 instead of 100 you might need to divide the learning rate by a similar factor (or maybe a bit less...); so, your learning rate for the 2 example batches might have been much too high...
(Dec 01 '13 at 19:16)
Dan Ryan
Batch learning is not just for performance reasons. The bigger the batch is, the better the generalization. Without a batch (or batch size = 1), you will update the weights after each step and for the next learning step, you use the new weights. That can perform much different than if you use the same weights for a whole batch.
(Dec 12 '13 at 14:07)
Albert Zeyer
|