I am trying to train an RBM with 8 hidden binary units and 40 visible ReLUs.

At first, I had issues with binary units becoming stuck due to the weight saturating, but I got rid of that problem by initializing the bias for the ReLUs to the sample average, and initializing weights to a value that took into account the standard deviation of each visible unit across the sample.

I did get rid of saturation.

However, I am facing another problem... as training goes, my hidden units are becoming perfectly correlated or anti-correlated. The RBM converges to a solution where there are effectively only two possible hidden states: either one subset of hidden units is activated (with p~2/3), or the other is (with p~1/3)

Needless to say, this isn't representative of the data, which doesn't come close to being described by only two clusters.

What gives? I understand the issues that would leave hidden units to become stuck, but I do not understand why they would become perfectly correlated or anti-correlated.

This happens either with a CD-15 training policy, or PCD-15. Decreasing the learning rate hasn't seemed to help.

asked Apr 20 at 15:01

Arthur%20B's gravatar image

Arthur B
1111

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.