Hi, it seems you definitely have something wrong with your math. Visible units are continuous and
To begin with energy function that mentioned in Hinton's practical guide is
So if you do some math(for detail look at this) you will have:
Now visible units is continuous random variable:
Contrastive divergence learning algorithm will be the same. You just sample visible units from normal distribution.
Also another parameterization of energy function can be done (look at thisfor more details)
Now mean of visible unit does not scale by standard deviation.
As long as learning of standard deviation is not quite stable, common practice is to fix sigma to 1 and normalize your input to zero mean and standart deviation 1. But sometimes it is a good idea to learn standard deviation by using different parameterization of the variance parameters Since we learn log-variances
,
is naturally constrained to stay positive.
Bernoulli-Bernoulli RBM can model arbitrary binary distribution but a Gaussian-Bernoulli RBM is extremely limited in the class of distributions it can represent. I extremely recommend you to read 4.1 Conceptual Understanding of Gaussian-Binary RBMs from link text thesis.
So I actually do not understand why do you need both hidden and visible layers continuous?
Articles that may be useful: