3
1

Where can I find an implementation of sparse coding?

I would like to take my 50-dimensional dense word embeddings (of which there are about 270K vectors), and transform them into a higher sparse representation, maybe each around 500 dimensions.

Where can I download an off-the-shelf implementation of a sparse coding algorithm to do this?

asked Aug 02 '10 at 15:37

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146


3 Answers:

I am not sure I understand your problem setting: do you already have a dictionary or not? If you have one you can use the LARS implementation of scikit-learn as a sparse encoder that given a set of n_samples vectors of size n_features and a dictionary of size (n_prototypes, n_features) will find n_samples sparse coded vectors of size n_prototypes with a predefined maximum number of non zero components. Have a look at this unit test for instance.

If you don't have a dictionary, you can build one out of your samples by using a PCA or a chained PCA + ICA for instance (you can use mdp nodes for instance). Or you can implement a real dictionary learner using autoencoders with sparsity constraint as explained in these lecture notes or the dedicated online dictionary learner as described in this paper by Julien Mairal.

answered Aug 02 '10 at 16:06

ogrisel's gravatar image

ogrisel
498995591

scikit-learn has an ICA implementation, so you don't need MDP :)

(Apr 07 '11 at 04:19) Gael Varoquaux

I think feature hashing is a very fast and convenient solution for this problem; just take the hash modulo the output dimension for each input feature. increment the resultant bin by value in the input space. in the case of language, this whole operation is trivial.

http://portal.acm.org/citation.cfm?id=1553516

answered Aug 02 '10 at 16:44

downer's gravatar image

downer
54891720

1

Isn't feature hashing used to reduce the total number of features, and not increase it?

(Aug 02 '10 at 16:48) Alexandre Passos ♦

You can try this implementation, originally proposed in this paper. The approach expresses a dense signal as a high(er) dimensional sparse signal using an overcomplete dictionary.

You might consider taking a look at this too.

answered Aug 02 '10 at 17:39

spinxl39's gravatar image

spinxl39
3698114869

edited Aug 02 '10 at 17:45

Good answer, I found it useful today. Honglak's code has since been moved to the U Michigan site: http://www.eecs.umich.edu/~honglak/softwares/nips06-sparsecoding.htm

(Apr 04 '11 at 12:05) Ian Goodfellow
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.