I want to compute the pairwise cosine similarity of items in a vector space of a very high dimensionality .

My input matrix is very sparse, but the number of nonzero elements per item follows a very skewed distribution (i.e. power law-ish, with very few items having lots of nonzero features, and vice versa).

Intuitively, comparing items with very different numbers of features doesn't seem very desirable, but the only idea I got to mitigate this problem is to partition my input matrix in "bands of items having similar #s of features", which is not obvious to do, given the very skewed distribution.

asked Apr 25 '14 at 12:04

Christian%20Jauvin's gravatar image

Christian Jauvin
1223

edited Apr 25 '14 at 13:05

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.