Hi all, WIkipedia in its article http://en.wikipedia.org/wiki/Locality_Sensitive_Hashing; enumerates a bunch of steps to do after having constructed the hashing functions that compose LSH. In essence i don't understand those steps well, and i don't understand wether the algorithm gives you one NN solution or all the solutions that satisfy that they are a distance lesser than cR.

asked Jan 03 '11 at 08:16

fbru02's gravatar image

fbru02
6112


One Answer:

Not sure if i remember correctly, but when I used it, it was something like that :

  • With the hashing functions, build L hashtables referring to your original data, of length K.
  • Make the buckets (2^K for each of the L representations), containing the list of the indexes of the original vectors that have the same signature.

For instance if element 14 and element 75 have a hash==10100111, they will be in the same bucket in hashtable.

So if you want to obtain the NN of element 14, you look up its hash and you will know from the bucket hashtable[10100111] that element 75 (and probably some others) may be in its neighborhood. Iterating through the L representations, you will quickly build a list of at most L*(n/(2^K)) interesting candidates with a calculable risk of missing a candidate. You can then use this list as the first step for sorting if you want N neighbours.

answered Jan 03 '11 at 13:18

Guillaume%20Pitel's gravatar image

Guillaume Pitel
176148

"wether the algorithm gives you one NN solution or all the solutions that satisfy that they are a distance lesser than cR"

-> it depends of what you need

  • if you want any candidate within a given distance, you can stop searching as soon as you've found it (that's approximate nearest neighbor)
  • if you want K nearest neighbors, you do a partial sort as usual, but starting with the candidates that LSH gives you
(Jan 04 '11 at 03:50) Guillaume Pitel

Nice answer!

(Sep 27 '11 at 13:28) Mathieu Blondel
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.