|
When is it preferable to have a low-dimensional and dense representation? When is it preferable to have a high-dimensional and sparse representation? |
|
High-dimensional and sparse representations are generally preferred when the learning algorithm scales in the number of active features (non-zeros). For example, logistic regression, as well as supervised training of an SVMs and neural networks, all scale in the number of active features. For logistic regression and neural networks, naive implementations will have dense weight matrices, which will scale in the total number of features. Low-dimensional and dense representations are generally preferred when the learning algorithm scales in the total number of features. For example, training of (unsupervised) latent variable models like LDA and RBMs, as well as auto-associators scale in the total number of features. I believe there might be motivations from neuroscience for high-dimensional and sparse representations. It is also possible convert between dense and sparse representations. Ranzato + Szummer (2008) induce dense low-dimensional and sparse high-dimensional representations for document categorization, and find that dense low-dimensional representations are superior in their experiments. YMMV. |
|