I am currently playing around with different models of unsupervised feature extraction and different learning algorithms to optimize their behaviour. There consists a vast amount of different methods now, of which I find it hard to asses the actual quality: autoencoder family, rbm family, product of student's t family, matrix decompositions and so on.
Most of the methods I look at learn a function that is composed of a linear transformation and an elementwise nonlinearity, possibly stacked on top of each other.
I know about 'looking at filters' as done in most research papers on the topic. However, I find the process of looking at gazillions of filter pictures extremly boring (especially if lots of them are bad).
I also know about using a discriminative task as a proxy (like classification performance on CIFAR).
Both methods are unsatisfying in several ways:
Some thoughts I had:
I was wondering what people are doing to find out what makes good features. Any pointers to papers as well as personal experiences are greatly appreciated.
asked Oct 13 '11 at 04:39
There was a paper at NIPS 2009 about invariance vs destinctiveness in extracted features:
Measuring Invariances in Deep Networks Ian Goodfellow, Quoc Le, Andrew Saxe, Honglak Lee, Andrew Ng
answered Oct 13 '11 at 08:04
Geoffrey Hinton's paper A practical guide to training restricted Boltzmann machines has some good pointers that may help.
The three sections that seem applicable to me are 5, 6, and tangentially 7.
Measuring the quality of the filters directly may be possible, but in the past I've been inclined to measure the quality mostly based on a combination of visual inspection of filters as well as both visual and empirical evaluation of reconstructions.
answered Oct 13 '11 at 12:04
The crux of the problem is how to define and measure quality. Ultimately, this is going to depend on how you are going to use the features.
Some criteria that might or might not be important: reconstruction error, low dimensionality (small number of extracted features), uncorrelated or independent features, high information content (for a prediction task), sparsity, conceptually tight features (don't mix incompatible info sources). My term for the tradeoffs between these is Data Ergonomics. In brief, what is the best representation for both humans and computers for the desired analysis scenario?
An interesting question (at least, for me) is whether one can prove that some criteria combinations are impossible to achieve simultaneously. E.g., I believe that sparse features (that are functions of only a small number of inputs) are impossible if one requires features to be linearly orthogonal.
answered Oct 13 '11 at 13:12