|
Where can I find an implementation of sparse coding? I would like to take my 50-dimensional dense word embeddings (of which there are about 270K vectors), and transform them into a higher sparse representation, maybe each around 500 dimensions. Where can I download an off-the-shelf implementation of a sparse coding algorithm to do this? |
|
I am not sure I understand your problem setting: do you already have a dictionary or not? If you have one you can use the LARS implementation of scikit-learn as a sparse encoder that given a set of n_samples vectors of size n_features and a dictionary of size (n_prototypes, n_features) will find n_samples sparse coded vectors of size n_prototypes with a predefined maximum number of non zero components. Have a look at this unit test for instance. If you don't have a dictionary, you can build one out of your samples by using a PCA or a chained PCA + ICA for instance (you can use mdp nodes for instance). Or you can implement a real dictionary learner using autoencoders with sparsity constraint as explained in these lecture notes or the dedicated online dictionary learner as described in this paper by Julien Mairal. scikit-learn has an ICA implementation, so you don't need MDP :)
(Apr 07 '11 at 04:19)
Gael Varoquaux
|
|
I think feature hashing is a very fast and convenient solution for this problem; just take the hash modulo the output dimension for each input feature. increment the resultant bin by value in the input space. in the case of language, this whole operation is trivial. http://portal.acm.org/citation.cfm?id=1553516 1
Isn't feature hashing used to reduce the total number of features, and not increase it?
(Aug 02 '10 at 16:48)
Alexandre Passos ♦
|
|
You can try this implementation, originally proposed in this paper. The approach expresses a dense signal as a high(er) dimensional sparse signal using an overcomplete dictionary. You might consider taking a look at this too. Good answer, I found it useful today. Honglak's code has since been moved to the U Michigan site: http://www.eecs.umich.edu/~honglak/softwares/nips06-sparsecoding.htm
(Apr 04 '11 at 12:05)
Ian Goodfellow
|