I'm trying to implement a RBM and I'm testing it on MNIST dataset. However, it does not seems to converge.

I've 28x28 visible units and 100 hidden units. I'm using mini-batches of size 50. For each epoch, I traverse the whole dataset. I've a learning rate of 0.01 and a momentum of 0.5. The weights are randomly generated based on a Gaussian distribution of mean 0.0 and stdev of 0.01. The visible and hidden biases are initialized to 0.

After each epoch, I compute the average reconstruction error of all mini-batches, here are the errors I get:

epoch 0: Reconstruction error average: 0.0481795
epoch 1: Reconstruction error average: 0.0350295
epoch 2: Reconstruction error average: 0.0324191
epoch 3: Reconstruction error average: 0.0309714
epoch 4: Reconstruction error average: 0.0300068

I plotted the histograms of the weights to check (left to right: hiddens, weights, visibles. top: weights, bottom: updates):

Histogram of the weights after epoch 3 Histogram of the weights after epoch 3

Histogram of the weights after epoch 4 Histogram of the weights after epoch 4

but, except for the hidden biases that seems a bit weird, the remaining seems OK.

I also tried to plot the hidden weights:

Weights after epoch 3

Weights after epoch 3

Weights after epoch 4

Weights after epoch 4

(they are plotted in two colors using that function:

static_cast<size_t>(value > 0 ? (static_cast<size_t>(value * 255.0) << 8) : (static_cast<size_t>(-value * 255.)0) << 16) << " ";

)

And here, they do not make sense at all...

If I go further, the reconstruction error falls a bit more, but do no go further than 0.028. Even if I change the momentum after sometime, it goes higher and then goes down a bit but not interestingly. Moreover, the weights do no make more sense after more epochs. In most example implementations I've seen, the weights were making some sense after iterating through the complete data set two or three times.

I've also tried to reconstruct an image from the visible units, but the results seems almost random.

What could I do to check what goes wrong in my implementation ? Should the weights be within some range ? Does something seems really strange in the data ?

Complete code: https://github.com/wichtounet/dbn/blob/master/include/rbm.hpp

asked Jan 22 '14 at 09:33

Baptiste%20Wicht's gravatar image

Baptiste Wicht
31121315

edited Jun 05 '14 at 04:47

It would be easier to see what the weights are doing if you instead plot them with this function:

brightness = sigmoid(w(i,j) / (3.0 * stddev))

where stddev is the standard deviation of the weights (as calculated when visualizing, not when initializing). This will have large positive weights appear white and large negative weights appear black.

Beyond that, I've taken a look at your code and I don't see anything too out of the ordinary. It looks like your implementation of CD is correct. Though, you're actually initializing your weights with standard deviation of 0.1 (not 0.01). You can try initializing your weights with standard deviation of:

stddev = 1.0 / sqrt(visible_units + hidden_units)

to prevent over-saturating the sigmoid. For the 784-100 architecture that's about 0.034.

To get an idea of what your features should be looking like, you can play around with my tool VisualRBM (Windows only, requires somewhat modern GPU supporting OpenGL 3.3):

https://code.google.com/p/visual-rbm/

EDIT: You can also try initializing your hidden biases to negative values (-4 or so). Generally speaking though (with MNIST at least), it's going to take more than 4 epochs through the training set before you see weights you can visually understand.

(Jan 24 '14 at 14:07) Richard Pospesel

One Answer:

did you try with smaller learning rate?

answered Jun 05 '14 at 05:56

Ng0323's gravatar image

Ng0323
1567915

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.