|
Hi, I have been trying to understand Dr. Lee's original code for convolutional RBM's (see here) And what is a bit confusing is what is done during inference. see below for the code for the inference function. In the code, imdata is just a minibatch of images. W, hbias_vec are the Conv-RBM weights, and hidden biases. pars is a matlab structure for the parameters of the network. I have trouble understanding why pars.std_gaussian is being used as the bolded portion below. pars.std_gaussian is initially 0.2 and drops to 0.1 in about 70 epochs. So it sort of 'magnifies' the effect of the positive activites that are to reach the hidden units, by 100 times. In case we had gaussian units, this would qualify for sort of an 'unlearned' variance for the gaussians, right? I am working MNIST which has been shown to be learnable on binary units. So if i want to use this code for binary units,would it be sensible to set pars.std_gaussian to 1? However, in that case I am unable to get stroke detectors even in 1000 epochs of training. So the role of std_gaussian to magnify the activities seems to important. So I am really confused how to theoretically justify this. Can anyone help? Thanks
|
|
Perhaps poshidprobs2 should be used for MNIST which is approximately binary. Also, whitening may not be needed. Can you try it and let us see the learnt weights? |
|
Hello, do you train the second layer CRBM? For the second layer, the visible units should be binary, so i modify the ccrbm_reconstruct function as: function negdata = crbm_reconstruct(S, W, vbias_vec, pars) ws = sqrt(size(W,1)); patch_M = size(S,1); patch_N = size(S,2); numchannels = size(W,2); numbases = size(W,3); S2 = S; negdata2 = zeros(patch_M+ws-1, patch_N+ws-1, numchannels); for b = 1:numbases, H = reshape(W(:,:,b),[ws,ws,numchannels]); negdata2 = negdata2 + conv2_mult(S2(:,:,b), H, 'full'); end for c=1:numchannels negdata2(:,:,c) = 1/(pars.std_gaussian^2).(negdata2(:,:,c) + vbias_vec(c)); negdata2(:,:,c) = 1./(1 + exp(-negdata2(:,:,c))); end negdata = 1negdata2; end but i am not sure whether it is right or not? |
|
I have tried it on MNIST keeping the sigma _start and sigma _stop =1 each, so as to nullify their effect in the inference function. This lead to filters that probably werent learnt very well with 1400 epochs.(here) (learning rate=0.00001, num _epochs=1400) If I increase the learning rate to 0.0001, I get filters that are clones of each other. here Removing whitening was a good suggestion, but when I did that, I got blobbish filters. here (learning rate=0.0001, num _epochs=1400) This is strange. However keeping the whitening, as well as sigma _start=0.2 and sigma _stop=0.1 gave very good stroke detectors. here (learning rate=0.001, num _epochs=1400) Apart from the parameters mentioned here, num _bases=40,receptive _fields=10x10, the rest are according to the original code. So this gets weird. Would anyone be willing to try out these params on MNIST using this code. I want to be sure I didn't do anything odd while doing this. In particular just setting the sigma _start and sigma _stop to 1. Did you try function [poshidprobs2] = .... instead of [posthidexp2]. Similarly for reconstruction function. I think MNIST should use binary RBM. So both inference and reconstruction should have sigmoid activations.
(Feb 28 '14 at 06:56)
Ng0323
The learning rate is low. Please try with .05. And within 500 epochs you can get some patterns. I got good features with Natural images, didnt try with MNIST though.
(Feb 28 '14 at 21:09)
Sharath Chandra
|