4
1

I found out only too recently about the "k-means trick": run k-means on a large dataset, then use distance/similarity to the cluster centers as features. Where does this approach come from? I've found a paper about its application to computer vision, and one about NLP, but where can I read more and what do I cite if this works for me (apart from those two papers)? The latter paper doesn't cite much literature except competing approaches.

asked Mar 11 '13 at 11:00

larsmans's gravatar image

larsmans
67651424

edited Mar 11 '13 at 16:50

Not sure where Coates got the idea from in particular, but he wrote a nice summary about K-means for feature learning as well: Learning Feature Representations with K-means

(Mar 12 '13 at 08:49) Sander Dieleman

Thanks! I think I can find what I'm looking for in that paper and its references.

(Mar 18 '13 at 19:05) larsmans

2 Answers:

sorry isn't this just radial basis function networks (pre SVMs)

answered Mar 12 '13 at 10:10

SeanV's gravatar image

SeanV
33629

1

This is different in that the clusters are computed prior to training the network. Features can therefore be combined very flexibly, compared to the RBF network, where you just sum over the RBF activations.

(Mar 12 '13 at 11:11) Oscar Täckström

To me its just a matter of interpretation... rbf ( in this form) is just a non linear transform of inputs + linear regression, much as taking polynomial transform of your inputs

(Mar 12 '13 at 11:30) SeanV

I might have made this more clear in the question, but the big idea is not the RBF transform; it's clustering unlabeled data to extract features for subsequent supervised learning. It's a bit like pre-training, and Hinton's work is mentioned, except with an off-the-shelf clustering algorithm.

If you know a paper where this is done with RBF nets (I don't know much about those), I'd be interested.

(Mar 18 '13 at 19:02) larsmans
1

maybe I just misunderstand what you are getting at. but the standard RBF network algorithm is unsupervised k-means clustering (to find your non linear features) followed by linear regression,say, (supervised learning) from features to classification/regression task. Bishop refers to broomhead and lowe 1988, http://www.complex-systems.com/pdf/02-3-5.pdf (and moody and darken ,..., ,)

look at page 9-15 http://www.amlbook.com/slides/iTunesU_Lecture16_May_24.pdf#page=13

Now to me the only distinction to what you are suggesting is just that your unsupervised learning is done on a bigger set ( ie because you have typically more unlabelled data), but this doesn't really seem a big jump to me.

The reson you don't know about radial basis function nets( i believe) is because SVMs with an RBF transform are much better at a classification task- this is abu-mostafa's point in the lecture - centres are chosen to maximise margin rather than for unsupervised clustering. My interpretation of all the Deep learning etc is that we are now in a technological cycle where processing power is much greater than memory, and for these big data problems SVMs cannot be used, and neural network type methods are staging a comeback

(Mar 22 '13 at 06:33) SeanV
1

For learning features with K-Means, you do not necessarily use RBFs as features. You can also use a sparse code, soft-thresholding, etc. The RBF network idea has basically been extended. Yet while RBF networks use RBFs as their selling point, K-Means feature learning uses K-Means. :) (see "The importance of encoding versus training with sparse coding and vector quantization" by Adam Coates.)

(Mar 22 '13 at 07:11) Justin Bayer

I'm beginning to see how this is very similar to an RBF network, though, so +1 for the answer. I might accept it after plowing through all the RBF net literature that I've dug up.

(Mar 25 '13 at 07:45) larsmans

Schwenker et al. 2001 spell out how k-means on an unlabeled set can be used to fit the hidden layer of an RBF network. Even though the approach I had in mind is not exactly an RBF net, the idea does stem from there.

(Mar 25 '13 at 09:10) larsmans
showing 5 of 7 show all

@justin: I think we just disagree on what an RBF network is. I view it as a clustering/dictionary learning input stage + linear regression- i don't view it as a "RBF network". just as one could equally add polynomial transforms of the data. so similarly kohonen networks were also used for feature discovery.

the paper you referred to seemed to suggest that kmeans clustering was not even so useful!

Our results show not only that we can use fast VQ algorithms for training, but that we can just as well use randomly chosen exemplars from the training set.

answered Mar 22 '13 at 08:37

SeanV's gravatar image

SeanV
33629

With respect to the paper, that result you are quoting is relevant for their exact setting. They hypothesise in the paper that it does not hold for other data. [pointless arguing over definitions]The name RBF network implies that the base functions are radial. Polynomials, sparse codes and soft thresholding are not radial. Furthermore, K-Means is no dictionary learning algorithm, only gain shape K-Means is. That stage is in most texts only described as a way to deal with the curse.[/pointless arguing over definitions]

(Mar 22 '13 at 08:53) Justin Bayer
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.