I am implementing Bernoulli-Bernoulli RBM and trying to debug it. I want RBM to learn uniform distribution over two binary variables. So RBM has 2 visible and 4 hidden units. It seem that RBM will end up with zero weight matrix, zero bias vector for both visible and hidden units.

I use learning rate = 0.1, contrastive-divergence k = 3000 and full batch update. As this is a toy task I can compute marginal distribution over visible units. So after random initialization from normal distribution with mean = 0 and standart deviation = 0.01 I have: (b for visible bias, c for hidden bias)

W:

-0,008 0,014 -0,009 0,003
-0,001 0,002 -0,001 0,019

b: 0,006 -0,017
c: -0,003 0,001 0,003 0,004

p(0,0) = 0,250 p(0,1) = 0,248 p(1,0) = 0,252 p(1,1) = 0,250

After 1000 iterations

W:

-0,063 -0,060 -0,089 -0,046
-0,012 -0,031 -0,045 -0,037

b: -0,009 0,081

c: -0,083 -0,093 -0,086 -0,079

p(0,0) = 0,263 p(0,1) = 0,269 p(1,0) = 0,231 p(1,1) = 0,237

After 10000 iterations

W:

-0,210 -0,205 -0,208 -0,197
-0,219 -0,222 -0,225 -0,231

b: 0,279 0,303

c: -0,576 -0,577 -0,574 -0,586

p(0,0) = 0,247 p(0,1) = 0,248 p(1,0) = 0,247 p(1,1) = 0,258

I did not use weight decay neither momentum. So my question is why does this happens? Is it bug in my implementation or such behaivor can be explained somehow?

asked May 21 '13 at 15:55

Midas's gravatar image

Midas
42151017

edited Aug 31 '13 at 11:26


One Answer:

Why must the weights and biases be zero if you aren't using weight decay? There are many settings of the RBM parameters that implement a uniform distribution.

Also, are you using CD1? Remember that you aren't doing maximum likelihood training and CD doesn't really give you a real gradient.

answered May 29 '13 at 14:01

gdahl's gravatar image

gdahl ♦
341453559

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.