When is it preferable to have a low-dimensional and dense representation?

When is it preferable to have a high-dimensional and sparse representation?

asked Jun 23 '10 at 17:23

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146


2 Answers:

High-dimensional and sparse representations are generally preferred when the learning algorithm scales in the number of active features (non-zeros). For example, logistic regression, as well as supervised training of an SVMs and neural networks, all scale in the number of active features. For logistic regression and neural networks, naive implementations will have dense weight matrices, which will scale in the total number of features.

Low-dimensional and dense representations are generally preferred when the learning algorithm scales in the total number of features. For example, training of (unsupervised) latent variable models like LDA and RBMs, as well as auto-associators scale in the total number of features.

I believe there might be motivations from neuroscience for high-dimensional and sparse representations.

It is also possible convert between dense and sparse representations.

Ranzato + Szummer (2008) induce dense low-dimensional and sparse high-dimensional representations for document categorization, and find that dense low-dimensional representations are superior in their experiments. YMMV.

answered Jun 23 '10 at 17:40

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

  • Sparse high dim representation is nice for indexing and fast lookup retrieval using Lucene or any other fulltext indexer: you treat each dimension as a "term" frequency when computing a TF-IDF score for ranking the results. You also perform efficient similarity queries using a strategy such as MoreLikeThisQuery from Lucene / Solr.

  • Dense very low dim (2D or 3D) is nice for visualization (see t-SNE and variants) and efficient KNN queries using KD-tree datastructure and variants implemented is Geographical Information Systems (GIS).

  • Intermediate low dim dense codes (64D or 32D for instance) can be used as an intermediate step for fast KNN queries using the 1 or 2 Hamming ball on binary codes extracted by thresholding the 64 floats dense code. This strategy is sometimes referred to as "semantic hashing".-

answered Jun 23 '10 at 17:34

ogrisel's gravatar image

ogrisel
498995591

edited Jun 23 '10 at 17:35

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.