|
I'm trying to implement a RBM and I'm testing it on MNIST dataset. However, it does not seems to converge. I've 28x28 visible units and 100 hidden units. I'm using mini-batches of size 50. For each epoch, I traverse the whole dataset. I've a learning rate of 0.01 and a momentum of 0.5. The weights are randomly generated based on a Gaussian distribution of mean 0.0 and stdev of 0.01. The visible and hidden biases are initialized to 0. After each epoch, I compute the average reconstruction error of all mini-batches, here are the errors I get:
I plotted the histograms of the weights to check (left to right: hiddens, weights, visibles. top: weights, bottom: updates): Histogram of the weights after epoch 3
Histogram of the weights after epoch 4
but, except for the hidden biases that seems a bit weird, the remaining seems OK. I also tried to plot the hidden weights: Weights after epoch 3
Weights after epoch 4
(they are plotted in two colors using that function:
) And here, they do not make sense at all... If I go further, the reconstruction error falls a bit more, but do no go further than 0.028. Even if I change the momentum after sometime, it goes higher and then goes down a bit but not interestingly. Moreover, the weights do no make more sense after more epochs. In most example implementations I've seen, the weights were making some sense after iterating through the complete data set two or three times. I've also tried to reconstruct an image from the visible units, but the results seems almost random. What could I do to check what goes wrong in my implementation ? Should the weights be within some range ? Does something seems really strange in the data ? Complete code: https://github.com/wichtounet/dbn/blob/master/include/rbm.hpp |




It would be easier to see what the weights are doing if you instead plot them with this function:
brightness = sigmoid(w(i,j) / (3.0 * stddev))
where stddev is the standard deviation of the weights (as calculated when visualizing, not when initializing). This will have large positive weights appear white and large negative weights appear black.
Beyond that, I've taken a look at your code and I don't see anything too out of the ordinary. It looks like your implementation of CD is correct. Though, you're actually initializing your weights with standard deviation of 0.1 (not 0.01). You can try initializing your weights with standard deviation of:
stddev = 1.0 / sqrt(visible_units + hidden_units)
to prevent over-saturating the sigmoid. For the 784-100 architecture that's about 0.034.
To get an idea of what your features should be looking like, you can play around with my tool VisualRBM (Windows only, requires somewhat modern GPU supporting OpenGL 3.3):
https://code.google.com/p/visual-rbm/
EDIT: You can also try initializing your hidden biases to negative values (-4 or so). Generally speaking though (with MNIST at least), it's going to take more than 4 epochs through the training set before you see weights you can visually understand.