|
I want to compute the pairwise cosine similarity of items in a vector space of a very high dimensionality . My input matrix is very sparse, but the number of nonzero elements per item follows a very skewed distribution (i.e. power law-ish, with very few items having lots of nonzero features, and vice versa). Intuitively, comparing items with very different numbers of features doesn't seem very desirable, but the only idea I got to mitigate this problem is to partition my input matrix in "bands of items having similar #s of features", which is not obvious to do, given the very skewed distribution. |