|
I am trying to train an RBM with 8 hidden binary units and 40 visible ReLUs. At first, I had issues with binary units becoming stuck due to the weight saturating, but I got rid of that problem by initializing the bias for the ReLUs to the sample average, and initializing weights to a value that took into account the standard deviation of each visible unit across the sample. I did get rid of saturation. However, I am facing another problem... as training goes, my hidden units are becoming perfectly correlated or anti-correlated. The RBM converges to a solution where there are effectively only two possible hidden states: either one subset of hidden units is activated (with p~2/3), or the other is (with p~1/3) Needless to say, this isn't representative of the data, which doesn't come close to being described by only two clusters. What gives? I understand the issues that would leave hidden units to become stuck, but I do not understand why they would become perfectly correlated or anti-correlated. This happens either with a CD-15 training policy, or PCD-15. Decreasing the learning rate hasn't seemed to help. |