|
Hello, the Johnson-Lindenstrauss lemma shows that projecting a small amount of data from a R^N to R^n while not changing too much the distance between pairs of points in the new space. Is there any ML-approach that implements this mapping? I know that Kohonen's SOM can be implemented in n dimensions, but is it able to preserve the distances as JL lemma says? Or are there other approaches? |
|
Many ML approaches use the joshnson-lindenstrauss lemma directly or indirectly. Off the top of my head:
Of course, apart from this JL is still useful in speeding up all sorts of numeric computations which are useful in machine learning, such as the svd. |
|
Apart from Alexandre's answers, another example that comes to mind is the Kernel as Features paper by Balcan and Blum where they show that, as opposed to the traditional view of kernels providing implicit high-dimensional mappings, kernels can also be seen as providing an explicit LOW-dimensional mappings. Such mappings can be useful if, say, you have an algorithm that you can't kernelize (i.e., examples don't appear as dot products) but still want to capture the properties of the high dimensional kernel induced feature space. You can then just run your algorithm with these low-dimensional features. |
|
It strikes me that PCA should do a good job here because the number of data points is relatively small. Note that the data will need to be centered before running PCA. To determine the low-dim embedding, zero the bottom N-n singular values. This will give an embedding that minimizes L2 distance between original and embedded points, IIUC. Not the distances J-L is talking about, but approximately retaining positions will approximately retain distances. I'd also consider spectral clustering algorithms, such as On Spectral Clustering: Analysis and an algorithm. |
I don't beleive Kohonen's SOM will preserve any notion of distance between points in the input space and the space its mapped to. SOMs are meant to provide relatively 'smooth' mapping between input space to the discrete space its mapped to, but there's no attempt at preserving relative distances between the two spaces.