Is the hashing trick used in Vowpal Wabbit the same as Locality-sensitive hashing

asked Nov 17 '11 at 11:16

ZXZ's gravatar image

ZXZ
46346


One Answer:

No.

In LSH, you hash items so that nearby items are probably are close when hashed. LSH is a way of doing approximate nearest neighbors quickly. LSH uses a special kind of hash function.

With VW, you hash features so that you can reasonably support an unknown a priori and changing feature set, like you'd have with natural language when using words as features. In batch style learning, the typical way to handle a bag of words representation is to build a structure that maps words features into indicies. But it has two disadvantages compared to the hashing trick, you need to keep that string -> integer index mapping around which takes up significant amounts of space, and you can't really add new features online, since you tend to have a fixed sized feature vector, with one weight per feature.

The key insight with the hashing trick is ignore the perfect word -> index mapping, your learning algorithm will optimize around collisions introduced by the hashing function, especially when you have high dimensional and redundant training instances. The hashing trick uses a standard hashing function, and is mostly about having the bravado (and some combination of proofs and practical experience) to be okay with allowing some random set of features to share a common weight.

This answer is marked "community wiki".

answered Nov 17 '11 at 12:00

Rob%20Renaud's gravatar image

Rob Renaud
41551321

edited Nov 17 '11 at 12:14

Thanks!

Does the performance of the learning algorithm affected by the choice of hashing functions?

(Nov 17 '11 at 12:12) ZXZ

Any reasonable hash function should be fine. A far important consideration is the size of the weight vector.

(Nov 17 '11 at 12:28) Rob Renaud

The bigger the size, the better? To experiment, does one need to test on varying sizes and compare the performance? Also, does multi-hash help in terms of giving good performance with shorter vector?

Another naive question, before trying any learning algorithms, can I just measure the degree of collusion to select the size of the vector?

(Nov 17 '11 at 12:49) ZXZ

I don't think you can do a good job a priori just based on how many collisions the hash introduces. There is also an important, problem dependent amount of redundancy/discrimative power in your feature vectors. Consider that if you duplicated every feature and kept the weight vector the same size, measuring collisions would say you are doing much worse, but since there is no more real information in the param vector, and so keeping the weight vector size constant shouldn't change anything.

(Nov 17 '11 at 14:02) Rob Renaud
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.