I'm using an RBM model with binary stochastic hidden units and real-valued visible units to model real data in some positive range, e.g. [1, 5]. I'm reconstructing the visible units by simply summing the top-down input from the hidden units plus the bias. I'm adding no Gaussian noise, so we can say my visible layer is deterministic given the hidden units. Actually, I've tried adding Gaussian noise but it only made the model quality worse.

My question is whether this approach contradicts the principles of RBM as they are probabilistic models? One objection I heard against it was that I'm not using conditional probability distribution over the visible units given the hidden ones and that I'm not sampling the visible units. If we assume Gaussian distribution over the visible units I'm always taking its mean when reconstructing. Is this wrong and why?

In this paper: http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf, Hinton says "In many applications, it is much easier to first normalise each component of the data to have zero mean and unit variance and then to use noise free reconstructions, with the variance in equation 17 set to 1. The reconstructed value of a Gaussian visible unit is then equal to its top-down input from the binary hidden units plus its bias."

Why having zero mean and unit variance leads to noise-free reconstructions of the visible units? My reconstruction is exactly the same although I'm not normalizing the data and it works really well.

asked Apr 27 '13 at 05:21

Lobachevsky's gravatar image

Lobachevsky
16112

edited Apr 27 '13 at 05:23


One Answer:

I'm doing what you're doing as well. I got the idea from this paper "using very deep autoencoders for content-based image retrieval" (http://www.cs.toronto.edu/~hinton/absps/esann-deep-final.pdf) that only performs a single sampling, from hidden to visible. Here is an interesting excerpt with regards to their CD1

"To reduce noise, we actually use the probabilties rather than the stochastic binary states in steps 2,3, and 4, but it is important to use stochastic binary hidden units in step 1 to avoid serious overfitting."

My guess is if you did sample the visible units you'll produce a worse model (as you saw) but probably reduce the chance of overfitting.

Seems like theory and practice doesn't always match up.

answered Apr 28 '13 at 08:00

Nghia's gravatar image

Nghia
46447

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.