I am trying to implement Linear - NReLU RBM prototype with C#. Visible units has linear activation functions

alt text

hidden units - NReLU with gaussian noise

alt text

I've decide to start experiments with MNIST dataset. So, in progress of training I've seen a reconstructed images (v -> h -> v) and realized that reconstructed images contains a gray background with different intensity in different images:

alt text

WEIGHTS (not features!) are strange too (showed first 100 out of 500 hidden neurons):

alt text

Histogram of weights and biases centered around zero and looks normal. Previously I've implemented binary - binary RBM, and there was not this artefact, so it's a 'feature' of my implementation Linear-NReLU RBM.

I have no ideas where to go. Is it bug? Any ideas for debugging? Is there enough information to suppose what's wrong or need something more?

Thanks and sorry for bad English! Vyacheslav


Added later.

With hidden biases = 2.5f, weights completely lost sparsity (showed 500 out of 500 hidden neuron)

alt text

but reconstruction error drops about 30%

alt text

I guess it is a wrong way..

asked Jul 23 '13 at 05:06

VAvd's gravatar image

VAvd
0112

edited Jul 24 '13 at 07:05


One Answer:

Those hidden features look fine (sparser than I'd expect though which some people would consider good) but have too much noisyness/speckling.

My guess is some of your NReLU's are getting stuck at always zero and never training. Try initializing bias a to small positive to fix this?

Alternatively in a few video's Hinton has said that a lot of times people use too high of a learning rate for NReLU's in RBMs so try lowering by a factor of 10, I think 0.001 usually works. If the learning rate is too high the units can get stuck positive or negative in a similar fashion creating speckling.

answered Jul 23 '13 at 17:00

Newmu's gravatar image

Newmu
29641014

edited Jul 23 '13 at 17:02

Thanks for your reply, Newmu. I've found misinformation in my title post. The image shows WEIGHTS, not features (100 hidden neurons (out of 500) with 784 weights on each neuron). My mistake, sorry. And I forgot to mention, that MNIST dataset has been normalized to mean = 0, variance = 1 before training. I am using LR = 0.00002f, bigger values results in instabilities (reconstruction error raises during training). I've tried to play with hidden biases, as you advise. By default I was using zero hidden biases (Hinton: A Practical Guide to Training Restricted Boltzmann Machines, 8.1). Setting hidden biases to a small positive gives me nothing. But setting hidden biases to big values (2.5f) results in dramatically reduce in reconstruction error, but unfortunately sparsity of weights had been completely lost. It looks like as wrong way (not sure).

I've played already with biases, weights, gaussian noise (from zero to sigmoid), learning rate, and steps count in contrastive divergence algorithm (1 and 10). Nothing helps (except very high hidden biases). Maybe MNIST dataset is not suitable for modeling by Linear-NReLU RBM?

Can anyone who has this type of RBM reproduce these troubles? It would let me make sure that there is no implementation error.
Other way I able to share my code, if anyone interested in.

(Jul 24 '13 at 06:55) VAvd
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.