I've got some high dimensional feature vectors that can take on values {0,1,2}, and would like to use the hashing trick to reduce the dimensionality of this data and potentially to look at cross-product features, like VowpalWabbit does.

However, with only 3 levels in each feature, hashing will give me just 3-dimensional feature, at best, which is far too small. Are there ways around this problem?

asked Jun 29 '13 at 03:10

digdug's gravatar image

digdug
245111620


One Answer:

What do you mean by high-dimensional in this case? Regardless of how many values a feature can take, hashing works on your feature vector (which is high-dimensional if you have many features or each feature can take many values) mapping each coordinate of the feature vector into another coordinate in a smaller feature space, often implicitly (that is, without directly storing the feature vector in memory).

How you deal with features with multiple values has nothing to do with hashing, and you can use the same techniques you'd use in other circumstances; for example you can represent a categorical feature as many mutually exclusive binary features, each of which gets mapped to a unique coordinate in your feature vector.

answered Jun 30 '13 at 15:44

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

I'm probably mis-understanding something, but hashing is deterministic and if your vector is say (0, 0, 1, 1, 2), then the first two 0s will hash to the same value, then the next two 1s to another value, etc, so you only end up with three possible hash values. So the total number of non-zero hash coordinates will be three.

(Jul 01 '13 at 09:21) digdug
1

To use the hashing trick in your scenario you would need to do something like define your vector (0, 0, 1, 1, 2) as the set {(0,0),(1,0),(2,1),(3,1),(4,2)} where each tuple specifies (dimension,value) and then hash each element (tuple) in the set.

(Jul 01 '13 at 12:20) alto
2

In other words, you don't hash the feature values, you hash the feature indices.

(Jul 01 '13 at 14:53) Alexandre Passos ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.