|
I want to use restricted Boltzmann machines or deep learning networks to learn features. However, for my application, the inputs (images) may be rotated or scaled. Imagine, for instance, an image from the mnist database and rotate it. How about rescaling? I have the intuition that the features on the a single layer RBM will definitely not be rotation invariant, but for a multilayer RBM they might be at the higher levels. Could anyone provide some insight here? Is there any trick to make them work under such inputs? Thanks |
|
Hi Roderick. I don't think it is possible to learn rotation or scale invariance with a vanilla RBM (even if you stack them to a DBN). I would suggest looking into 3rd order RBMs. There has been some work on modeling arbitrary (more or less) image transformations using third order methods. I didn't find the paper I was thinking of but this here seems even closer to what you are looking for: Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines Thanks Andreas! Yeah third order RBMs are cooler :). But I havent found any code to do discriminative classification with them. Im not feeling confident that I will be able to implement them quickly just to try them out. Are you aware of any code on the net?
(Jun 20 '11 at 12:02)
Roderick Nijs
1
I don't know of any code for third order RBMs. I know there has been a paper about classification with third order on nips last year (or maybe the one before?) but that is not what you want I think. There has been something about occlusion modeling with third order which I think does finetuning, not sure though. Do you know any papers using third order "at the bottom" and doing finetuning? My lab has Python/CUDA code which aims to be very modular (and fast). It doesn't do 3rd order but I think it's not to hard to implement it. If you want, you can have a look here: https://github.com/deeplearningais/CUV or at my blog (with feature list): http://peekaboo-vision.blogspot.com/2010/11/restricted-boltzmann-machine-on-cuda.html If you want to do RBMs, even on MNIST, I recommend you get a GeForce ;)
(Jun 20 '11 at 12:11)
Andreas Mueller
1
I believe Andreas is thinking of the "Gated Softmax Classifier" which has code (and errata for the paper) at http://www.cs.toronto.edu/~rfm/gatedsoftmax/index.html Be warned, from what I recall when I re-implemented it myself the code was a bit confusing in some places. Marc'Aurelio Ranzato and his coauthors have used certain types of 3-way models as generative models of images and as first layers for all sorts of deep models with a good deal of success. There have also been many other related models to deal with some of the difficulties of the mean-covariance RBM. Roland Memisevic's work on image transformations uses conditional 3-way models which leads to very different challenges than Ranzato et al.'s work and has different applications. V. Mnih et al. describe some of the difficulties training conditional RBMs can pose and some possible solutions in http://www.cs.toronto.edu/~vmnih/docs/uai_crbms.pdf That being said, I don't think it is obvious how to use 3-way RBMs to simply make a model deal better with rotated and scaled input.
(Jun 22 '11 at 02:09)
gdahl ♦
Do you know of any conditional 3-way models - like the one I mentioned in my answer - that are used for classification?
(Jun 22 '11 at 04:20)
Andreas Mueller
Take a look at: 3-d object recognition with deep belief nets - Nair, V. and Hinton, G. An Efficient Learning Procedure for Deep Boltzmann Machines. - Ruslan Salakhutdinov I dont think they are trained discriminatively, but here they do seem to learn certain invariance using third order RBMs.
(Jul 11 '11 at 03:50)
Roderick Nijs
|
|
It should be perfectly possible to learn a certain amount of rotational invariance with stacked RBMs, however I feel like scale will be a bit harder. The relevant literature you should look at about how invariant various deep models are would be this paper and this tech report as well as this one. It should be noted that with any model, a simple, if inefficient, way to achieve desired invariances, if CPU (or GPU) time is not at a premium, is to train on an augmented training set of lots of images containing all the variations you expect to see. Hm I somewhat forgot about that paper. The measure introduced there is quite interesting. Though I would rather call the features "slow" than scale and rotation invariant. I wonder how sift with Harris Laplace would do in this test...
(Jun 21 '11 at 05:11)
Andreas Mueller
|
|
Would making the image features invariant be an option in your case? Well I want to use RBMS to sidestep feature selection. What maybe does make sense is to have a whole image representation that is invariant to rotation. A polar representation would help in converting rotations into translations. Do you know if RBMS can be easily made invariant to translations?
(Jul 11 '11 at 03:53)
Roderick Nijs
@Roderick - The only architecture I've seen that offers any amount of translation invariance is convolutional, but it only nets you a small amount of actual translation invariance. Take a look at Yann LeCunn's "LeNet". It uses many convolutional layers stacked on-top of each-other -- the net effect is it becomes translation invariant.
(Jul 12 '11 at 02:04)
Brian Vandenberg
|
|
I believe the single best thing to do (if it is possible) David already touched on: use the invariances to expand the training set as much as possible. Then train a very deep, very large net with as much data as possible for a long time. Jürgen Schmidhuber's and collaborators have demonstrated that training a broad, deep network with lots of data using deformations, rotations, etc. to expand the training set is highly effective. See these two papers . Using our prior knowledge about invariances to expand the training data is a great way to give that information to our models. If there is a lot of data, in the experience of my colleagues, this will beat creating a clever way of building some of these invariances into the model (unless it is well beyond the usual levels of cleverness). The final slide of these course lecture notes from one of Geoff Hinton's courses concur with my claim above as well. Another way to harness our knowledge about transformations of the input that should not change the class label is to use so-called transforming auto-encoders to learn smarter features. I agree that expanding the training set is the most promising approach when you are trying to do classification. However, I am not sure whether it helps to generate invariant features when doing unsupervised training. Do you know about any insights on that? Transforming autoencoders seem like one possiblity to learn invariant features. But I think work on these is still in a very early stage. Do you know of any paper that uses features from transforming auto-encoders to do classification (or use the features in any other way)?
(Jun 22 '11 at 04:26)
Andreas Mueller
@Andreas - I agree with your concerns. On the one hand, other research (eg, Yann LeCunn's LeNet) clearly demonstrates that it's possible to build something capable of dealing with rotational and (at least to a degree) scale invariance. But, what is really being learned here and how well does it generalize? It seems to me it's still not going to truly generalize the concepts. Supposing for example you trained it on all possible 3s at the same scale/orientation, then trained it on many scales/rotations for many of the 3s, my guess is it would have trouble with the ones that were left out. At least in terms of the human visual system, the brain has to be doing something other than just imagining every possible permutation of something it sees in order to 'learn' that concept. My daughter, for example, is pretty good at reading but I don't regularly try to get her to read upside down. Nevertheless, she can do it albeit slowly because she's not used to it. By contrast, if all rotations near 180 degrees were left out of the training set a DBN would probably have trouble with 180 degree rotated images even though it was exposed to other orientations.
(Jun 23 '11 at 13:18)
Brian Vandenberg
I started working in this direction. I am using RBMLIB to train a discriminative RBM or a Deep Belief Network. However, I have troubles setting the parameters. Is there a rule of thumb to set the amount of hidden units and layers? I have the impression my network is overfitting to the training data.
(Jun 30 '11 at 08:33)
Roderick Nijs
Why do you think you are overfitting? Did you compare likelihood ratios? I haven't really paid much attention to it but I have never actually heard of a case of overfitting in RBMs.
(Jun 30 '11 at 09:35)
Andreas Mueller
1
@Roderick - Check out Hinton's paper 'A Practical Guide To Training Restricted Boltzmann Machines': http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf . In addition, re: your question, in a very real sense the hidden units will adhere to basic principles of information storage. The system will try to come up with weights that will reproduce everything in the training set with high probability; the more bits you give it, the better it can do (not necessarily will do), but past some point you'll see diminishing returns unless you use a sparse regularizer. Furthermore, are you adding noise during each gibbs step? If not, that may be the problem.
(Jun 30 '11 at 11:02)
Brian Vandenberg
The overfitting is just a guess based on the poor results i get on the test set. The author of the library mentions overfitting on page 5 of his report: http://www.cs.ubc.ca/~andrejk/540project/540report.pdf . My dataset is quite imbalanced. I have 4 classes with 1 clearly dominating. I am doing this process: -Generating artificial data though rescaling and rotations. -Spliting into training and testing sets. -Creating a new balanced training dataset extracting samples from each class with equal probability (I have repeated samples of low probability classes). -Training & Testing Im not sure if it is right. Since I compensate for the imbalance in the dataset by introducing repeated samples of the less frequent classes, I think the network might be learning a less varied representation for these classes, and hence less generalizable than the one for the dominant one. Well in RBMs the overall likelyhood p(x,y) is intractable. The likelyhood p(y|x) is tractable but I think RBMLIB does not do it(at least out of the box). I guess you mean taking the likelihood ratio of the best p(y|x) vs the rest?
(Jun 30 '11 at 11:08)
Roderick Nijs
Well, it depends on your problem how intractable it is ;) What I meant is to compare the unnormalized likelihoods since the normalization constant cancels out if you look at likelihood ratios. Or do I overlook anything there? Btw I agree with Brian, you should definitely have a look at Hintons guide - though I disagree somewhat about some things ;) I just took a brief look at the LIBRBM report. It seems he is talking about overfitting during finetuning. He calls it "overfitting of the RBM" but he is talking about classification performance. I find this somewhat irritating. BTW, if you have CUDA, maybe you want to check out my labs library ;) http://peekaboo-vision.blogspot.com/2010/11/restricted-boltzmann-machine-on-cuda.html
(Jun 30 '11 at 11:20)
Andreas Mueller
showing 5 of 7
show all
|