I'm a beginner with theano and my problem is that my mini-batches are too big.

I'm using a convolutional neural network similar to the deeplearning tutorial. The images I'm using to train are so big that I can only use a mini-batch size = 2.

I really want to experiment with bigger sizes, so I'm trying to come up with code that only updates the weights after, say, 100 samples (so 50 mini-batch iterations).

The problem is that if I update after the 50th mini-batch, only the last mini-batch will affect the updates.

grads = T.grad(cost, params)
updates = []

for param_i, grad_i in zip(params, grads):
    updates.append((param_i, param_i - learning_rate * grad_i))

train_model = theano.function([index], cost,
        givens={
            x: train_set_x[index * batch_size: (index + 1) * batch_size],
            y: train_set_y[index * batch_size: (index + 1) * batch_size]})

perform_updates = theano.function([], updates=updates, on_unused_input='ignore')

for i in xrange(50): train_model(i)
perform_updates()

asked Nov 19 '13 at 13:51

jolix's gravatar image

jolix
26337

edited Nov 19 '13 at 13:53


One Answer:

I don't think there is a good reason to do this... The primary motivation to use mini-batches is so that the computation can make efficient use of the GPU (or multicore) architecture. If you are computing the gradients in batches of two because of memory constraints, you might as well update the model parameters every two examples as well.

That being said, if you really want to only update the model parameters every 100 examples, you could keep a running average (or sum) of the gradients computed until you reach 100 examples and then apply that averaged gradient as an update. I think that maintaining the average is just as much computational overhead (and even more memory overhead) as updating the model parameters every batch though. Again though, I think this is a bad idea compared to just updating the model after every batch.

answered Nov 20 '13 at 09:52

Dan%20Ryan's gravatar image

Dan Ryan
40671116

edited Nov 20 '13 at 09:53

1

Thank you for your anwser. This has solved my problem indeed. I do think there is a good reason to do this however. The mini-batch size can sometimes greatly influence the training speed and even the final accuracy.

(Nov 20 '13 at 12:42) jolix

Glad that worked... You might have been having hyperparameter issues with the small mini-batch size though... if you are using a batch size of 2 instead of 100 you might need to divide the learning rate by a similar factor (or maybe a bit less...); so, your learning rate for the 2 example batches might have been much too high...

(Dec 01 '13 at 19:16) Dan Ryan

Batch learning is not just for performance reasons. The bigger the batch is, the better the generalization. Without a batch (or batch size = 1), you will update the weights after each step and for the next learning step, you use the new weights. That can perform much different than if you use the same weights for a whole batch.

(Dec 12 '13 at 14:07) Albert Zeyer
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.