|
I'm using an RBM model with binary stochastic hidden units and real-valued visible units to model real data in some positive range, e.g. [1, 5]. I'm reconstructing the visible units by simply summing the top-down input from the hidden units plus the bias. I'm adding no Gaussian noise, so we can say my visible layer is deterministic given the hidden units. Actually, I've tried adding Gaussian noise but it only made the model quality worse. My question is whether this approach contradicts the principles of RBM as they are probabilistic models? One objection I heard against it was that I'm not using conditional probability distribution over the visible units given the hidden ones and that I'm not sampling the visible units. If we assume Gaussian distribution over the visible units I'm always taking its mean when reconstructing. Is this wrong and why? In this paper: http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf, Hinton says "In many applications, it is much easier to first normalise each component of the data to have zero mean and unit variance and then to use noise free reconstructions, with the variance in equation 17 set to 1. The reconstructed value of a Gaussian visible unit is then equal to its top-down input from the binary hidden units plus its bias." Why having zero mean and unit variance leads to noise-free reconstructions of the visible units? My reconstruction is exactly the same although I'm not normalizing the data and it works really well. |
|
I'm doing what you're doing as well. I got the idea from this paper "using very deep autoencoders for content-based image retrieval" (http://www.cs.toronto.edu/~hinton/absps/esann-deep-final.pdf) that only performs a single sampling, from hidden to visible. Here is an interesting excerpt with regards to their CD1
My guess is if you did sample the visible units you'll produce a worse model (as you saw) but probably reduce the chance of overfitting. Seems like theory and practice doesn't always match up. |