I am currently playing around with different models of unsupervised feature extraction and different learning algorithms to optimize their behaviour. There consists a vast amount of different methods now, of which I find it hard to asses the actual quality: autoencoder family, rbm family, product of student's t family, matrix decompositions and so on.

Most of the methods I look at learn a function that is composed of a linear transformation and an elementwise nonlinearity, possibly stacked on top of each other.

I know about 'looking at filters' as done in most research papers on the topic. However, I find the process of looking at gazillions of filter pictures extremly boring (especially if lots of them are bad).

I also know about using a discriminative task as a proxy (like classification performance on CIFAR).

Both methods are unsatisfying in several ways:

  • you can't calculate the quality of filters
  • as soon as you don't look at images, filters are not that easy to interpret
  • discriminative quality does not necessarily imply disentangling factors of variation etc

Some thoughts I had:

  • What is the frequency of a feature being active? Is it good if it's active 50% of the time?
  • What is the covariance of features? Is it good if features arrange to be in 'blocks' or is it good if the covariance matrix is mostly diagonal? Is it bad if they happen to arrange in blocks?
  • Is it good if a feature is 'on' or 'off' 95% percent of the time? You can argue that it is good because it's sparse, but you can also argue that it does nothing.

I was wondering what people are doing to find out what makes good features. Any pointers to papers as well as personal experiences are greatly appreciated.

asked Oct 13 '11 at 04:39

Justin%20Bayer's gravatar image

Justin Bayer

3 Answers:

There was a paper at NIPS 2009 about invariance vs destinctiveness in extracted features:

Measuring Invariances in Deep Networks Ian Goodfellow, Quoc Le, Andrew Saxe, Honglak Lee, Andrew Ng

answered Oct 13 '11 at 08:04

Andreas%20Mueller's gravatar image

Andreas Mueller

Geoffrey Hinton's paper A practical guide to training restricted Boltzmann machines has some good pointers that may help.

The three sections that seem applicable to me are 5, 6, and tangentially 7.

Measuring the quality of the filters directly may be possible, but in the past I've been inclined to measure the quality mostly based on a combination of visual inspection of filters as well as both visual and empirical evaluation of reconstructions.

answered Oct 13 '11 at 12:04

Brian%20Vandenberg's gravatar image

Brian Vandenberg

The crux of the problem is how to define and measure quality. Ultimately, this is going to depend on how you are going to use the features.

Some criteria that might or might not be important: reconstruction error, low dimensionality (small number of extracted features), uncorrelated or independent features, high information content (for a prediction task), sparsity, conceptually tight features (don't mix incompatible info sources). My term for the tradeoffs between these is Data Ergonomics. In brief, what is the best representation for both humans and computers for the desired analysis scenario?

An interesting question (at least, for me) is whether one can prove that some criteria combinations are impossible to achieve simultaneously. E.g., I believe that sparse features (that are functions of only a small number of inputs) are impossible if one requires features to be linearly orthogonal.

answered Oct 13 '11 at 13:12

Art%20Munson's gravatar image

Art Munson

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.