|
Hello everyone, I want my Restricted Boltzmann Machine to learn a new representation of real-valued data Hinton - 2010 - A Practical Guide to Training RBMs. I'm struggling with a implementation of Gaussian linear units. With Gaussian linear units in the visible layer the energy changes to E(v,h)= ∑ (v-a)²/2σ - ∑ bh - ∑v/σ h w. Now I don't know how to change the Contrastive Divergence Learning Algorithm. The visible units won't be sampled any more as they are linear. I use the expectation (mean-fied activation) p(v_i=1|h)= a +∑hw + N(0,1) as their state. The associations are left unchangend ( pos: datap(h=1|v)' neg: p(v=1|h)p(h=1|v)' ). But this only leads to random noise when I want to reconstruct the data. The error rate will stop improving around 50%. Finally I want to use gaussian linear units in both layers. How will I get the states of the hidden units then? I suggest by using the mean-field activation p(h_i=1|v)= b +∑vw + N(0,1) but I'm not sure. |
|
Hi, it seems you definitely have something wrong with your math. Visible units are continuous and To begin with, energy function that mentioned in Hinton's practical guide is So if you do some math(for detail look at this) you will have: Now visible units is continuous random variable: Contrastive divergence learning algorithm will be the same. You just sample visible units from normal distribution. Also another parameterization of energy function can be done (look at this for more details) Now mean of visible unit does not scale by standard deviation. As long as learning of standard deviation is not quite stable, common practice is to fix sigma to 1 and normalize your input to zero mean and standart deviation 1. But sometimes it is a good idea to learn standard deviation by using different parameterization of the variance parameters Bernoulli-Bernoulli RBM can model arbitrary binary distribution but a Gaussian-Bernoulli RBM is extremely limited in the class of distributions it can represent. I extremely recommend you to read 4.1 Conceptual Understanding of Gaussian-Binary RBMs from this thesis. So I actually do not understand why do you need both hidden and visible layers continuous? Articles that may be useful: Hello Midas, your answer is excellent but I still have a question. You have mentioned that the mean can be fixed to zero and the standart deviation to one, so in this case, does the way to sample the visible units from the hidden units need to be changed?
(Jul 10 '13 at 10:27)
Chen You
Depends on what you mean by 'need to be changed'. If you fix mean to zero and deviation to one you just sample visible units from normal distribution with appropriate mean and sigma equals one.
(Jul 14 '13 at 07:28)
Midas
Thanks for your excellent explanation. Like the above question of Chen You, when I update the parament of weight using the algorithm of CD or PCD, the gradient W=<(1/σ^2)vh>d -<(1/σ^2)vh>m, if I fix mean to zero and deviation to one, the gradient W will be: W= <vh>d -<vh>m . At the same time, I don't need to updata the parament of σ, because it is the fixed value 1. Is it right?
(Dec 11 '14 at 22:06)
Xupeng Wu
|
|
Hello all, I want to implementat Restricted boltzmann Machine in C# with example dataset : http://www.ieor.berkeley.edu/~goldberg/jester-data/ please help me! My email: [email protected] |
|
Thank you a lot Midas, your answer is exactly what I have asked for. My goal is to develop a deep belief net that learns meaning-full representations/features out of real valued images. @Gaussian-Gaussian RBM: You are right Midas, also Hinton - Chapter 13.3 says that there is no need for that (instability problems). Even Gaussian-Bernoulli RBM are extremely limited I want to start with them. Hinton - Chapter 13.5 suggests that Gaussian-Rectified or Rectified-Rectified RBM are quite more promising. |