Hello,

the Johnson-Lindenstrauss lemma shows that projecting a small amount of data from a R^N to R^n while not changing too much the distance between pairs of points in the new space. Is there any ML-approach that implements this mapping? I know that Kohonen's SOM can be implemented in n dimensions, but is it able to preserve the distances as JL lemma says? Or are there other approaches?

asked Apr 27 '11 at 05:01

Lucian%20Sasu's gravatar image

Lucian Sasu
513172634

I don't beleive Kohonen's SOM will preserve any notion of distance between points in the input space and the space its mapped to. SOMs are meant to provide relatively 'smooth' mapping between input space to the discrete space its mapped to, but there's no attempt at preserving relative distances between the two spaces.

(Apr 27 '11 at 13:44) crdrn

3 Answers:

Many ML approaches use the joshnson-lindenstrauss lemma directly or indirectly. Off the top of my head:

  1. Approximate indexing by random projections. If you want, say, to do KNN on very high dimensional spaces it gets really hard to build an index, but you can easily project the space to a small, reasonable, number of dimensions and use a KD-tree or something similar in the reduced space.

  2. Feature hashing. When designing linear classifiers, instead of having one dimension per feature, you can use far less dimensions by having K independent hash functions and "turning on" in the vector all the h_i(X) points. This implicitly uses random projections (as long as the hash functions are sufficiently random), which lets you prove nice error bounds with JL.

Of course, apart from this JL is still useful in speeding up all sorts of numeric computations which are useful in machine learning, such as the svd.

answered Apr 27 '11 at 05:40

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Apart from Alexandre's answers, another example that comes to mind is the Kernel as Features paper by Balcan and Blum where they show that, as opposed to the traditional view of kernels providing implicit high-dimensional mappings, kernels can also be seen as providing an explicit LOW-dimensional mappings. Such mappings can be useful if, say, you have an algorithm that you can't kernelize (i.e., examples don't appear as dot products) but still want to capture the properties of the high dimensional kernel induced feature space. You can then just run your algorithm with these low-dimensional features.

answered Apr 27 '11 at 11:54

spinxl39's gravatar image

spinxl39
3698114869

It strikes me that PCA should do a good job here because the number of data points is relatively small. Note that the data will need to be centered before running PCA. To determine the low-dim embedding, zero the bottom N-n singular values. This will give an embedding that minimizes L2 distance between original and embedded points, IIUC. Not the distances J-L is talking about, but approximately retaining positions will approximately retain distances.

I'd also consider spectral clustering algorithms, such as On Spectral Clustering: Analysis and an algorithm.

answered Apr 27 '11 at 18:40

jrennie's gravatar image

jrennie
12124

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.