RBM models are often trained using a 'mean field approximation', where rather than Gibbs sampling the visible units given hiddens and the hidden units given visibles to get the statistics necessary for computing the weight updates, the means of these conditional distributions are used instead.

I'm currently trying to reproduce the Gaussian RBM with learnt variance from 'Learning a generative model of images by factoring appearance and shape' by Le Roux et al. (2010). It has the following energy function:

Wp and bp are constrained to be negative. The subscripts m and p stand for 'mean' and 'precision' respectively.

This means that the conditional probability distribution of the hiddens given the visibles looks like this:

If I understood correctly, in a mean field setting, we compute these probabilities during training using the expected values of the visibles E[v], rather than a sample (i.e. replacing v by E[v] in the formula). However, in the case of a Gaussian RBM with learnt variance, v² also occurs in the formula. I'm not sure how to deal with that.

What my question essentially boils down to is: should E[v²] or E[v]² be used here, and why? And actually, is it even okay to use mean field at all for this model? Or does the 'simple' Gaussian-Bernoulli model with fixed variance have some other property making mean field a good approximation, that doesn't hold here?

I suppose I would run into the same issue with their Beta RBM model, for which log(v) occurs in the energy function, so then I would have to use either E[log(v)] or log(E[v]). I haven't attempted to implement this yet, though.

I guess I'm just trying to understand better what the mean field approximation is, what exactly is being approximated, and how/why it works. Most deep learning-related literature seems to go over this rather quickly, and if I look for stuff specifically about 'mean field', I mostly get physics papers that I don't understand at all. I was hoping someone here could shed some light on this, or point me to the right paper(s).

asked Dec 26 '11 at 08:49

Sander%20Dieleman's gravatar image

Sander Dieleman
155672734

edited Dec 26 '11 at 08:52


One Answer:

My first answer wasn't fully correct so I'm starting a new one from scratch. The reason why the standard Gaussian RBM uses E[v|h] rather than a sample is to avoid the extreme values of v which could potentially be sampled, which would lead to numerical inaccuracies.

In the case of the Gaussian RBM with learnt precision or the Beta RBM, the fact that we learn the precision helps overcome precisely this problem. It is thus highly recommended to use a sample rather than the mean.

Now (and this is the reason for the deletion of my first post), I am not sure what the proper thing to do would be, should you still want to use E[v|h] instead of a sample. My second guess (the first one being the deleted one) would be to exactly replace v by E[v|h]. In that case, that would mean using E[v]^2 (for the Gaussian RBM) and log(E[v]) (for the Beta RBM). However, it feels horribly wrong to do that and I fear that it would lead to terrible results.

Since these two RBMs allow you to do the proper thing (or at least a more valid thing than the mean-field approximation you are talking about), I would definitely stick with sampling v given h (which is what we have done in the paper, as far as I remember).

answered Dec 26 '11 at 12:47

Nicolas%20Le%20Roux's gravatar image

Nicolas Le Roux
7652912

Alright, thanks again. Out of curiosity, how did you arrive at the conclusion that it should be E[v]² and not E[v²]? Until now (even before your first answer) I was tending towards the latter... intuitively it seemed more correct than E[v]², even though I can't really it explain rigorously.

Since you said it feels horribly wrong nevertheless, I guess it wouldn't be a stretch to say that using the mean only makes sense when the energy function is linear in those particular units?

(Dec 26 '11 at 12:58) Sander Dieleman
2

My reasoning was that you will replace the full distribution of v given h by a meaningful sample, which will in that case be E[v|h]. As you can see, this is not very elaborate as I never thought of doing this anyway. I might think about it more tomorrow and give a better answer of what it should be, should you really want to do the wrong thing ;)

(Dec 26 '11 at 13:02) Nicolas Le Roux

Of course I want to do the right thing, but I also want to know why the wrong thing is wrong :)

I'm working on a modular RBM implementation, and I'm implementing some different types of units. Until now I'd assumed that every units type would have to implement a 'sample' and a 'mean_field' method, but I guess this is meaningless for Gaussian units with learnt precision, and by extension, any other type of units for which the energy function isn't linear. So that means I need to rethink the interface a little, hence my curiosity.

(Dec 26 '11 at 13:07) Sander Dieleman
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.