I have trained and RBM using the matlab Deep learning Toolbox by rasmusbergpalm. I have modified the rbmtrain.m code to do CDk

Here is the original code.

            v1 = batch;
            h1 = sigmrnd(repmat(rbm.c', opts.batchsize, 1) + v1 * rbm.W');
            v2 = sigmrnd(repmat(rbm.b', opts.batchsize, 1) + h1 * rbm.W);
            h2 = sigmrnd(repmat(rbm.c', opts.batchsize, 1) + v2 * rbm.W');

            c1 = h1' * v1;
            c2 = h2' * v2;

Changed Code.

            v1 = batch;
            h1 = sigmrnd(repmat(rbm.c', opts.batchsize, 1) + v1 * rbm.W');
            h2=h1;
            for j=1: opts.cdk
            v2 = sigmrnd(repmat(rbm.b', opts.batchsize, 1) + h2 * rbm.W);
            h2 = sigmrnd(repmat(rbm.c', opts.batchsize, 1) + v2 * rbm.W');
            end

            c1 = h1' * v1;
            c2 = h2' * v2;

Architectures: [784 100 100 100 10]
RBM epoch :30
neural network epoch :100
I have got the following errors for each CDk (k=1:20). Here is the graph.

cdk vs error


This shows the error is not imporving with increase in k in CDk. Did anyone have this experience or is my code wrong. What is your experience of increasing k?

asked Jun 25 '13 at 15:22

noname's gravatar image

noname
16114

edited Jun 25 '13 at 15:29

What error are you plotting?

(Jun 25 '13 at 16:54) alto

2 Answers:

I suppose that you have plotted the reconstruction error because likelihood estimation is quite expensive procedure. If this is the case then:

One way to look at CD training of RBMs is as a process that lowers the free energy of the data v in the positive phase and raises the free energy of the kth state of a Gibbs chain started at the data v.

If the negative samples vk have the same distribution as the data v, the loss function will be zero on average, meaning that we have successfully learned the data distribution p(v).

However, since the negative phase Gibbs chain starts at the data, this objective function will also be small if the chain is mixing slowly, vk will be very close to v.

So reconstruction error can be small but at the same time likelihood is small too. Reconstruction error is not what you actually optimize. If you run you Markov chain long enough then sample is from distribution close to true distribution.

So CD-k better than CD-1 for estimating true gradient of likelihood function w.r.t. parameters of RBM.

answered Jun 26 '13 at 17:14

Midas's gravatar image

Midas
42151017

edited Jun 26 '13 at 17:23

Thanks for the answer. The errors plotted are the classification errors(on MNIST) after fine-tuning.

(Jun 26 '13 at 18:17) noname

I have same problem. For all k values error is same. Is that true?

answered Nov 18 '14 at 20:15

subhaMano's gravatar image

subhaMano
11

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.