My problem currently is to use an SVM on a large set of classes. My individual classes are not going to be too big, maybe at most a few hundred elements. But, I will be dealing with tens of thousands of classes. Current SVM multi class methodologies, one-vs-all and all-vs-all seem prohibitively inefficient for such large classes. What is a good alternative to solve my problem? Preferably I'd like to have a solution with code solution already.

asked Jun 22 '12 at 15:30

mugetsu's gravatar image

mugetsu
233212431


3 Answers:

Why do you need an SVM? If you use multi-class logistic regression of some sort there should be no problem.

If the weights are too large (n inputs, m classes, so that n*m can't be stored), you can factor them so that W=UV where U is n by d and V is d by m with d < min(n,m)

This answer is marked "community wiki".

answered Jun 22 '12 at 23:15

gdahl's gravatar image

gdahl ♦
341453559

edited Jun 24 '12 at 01:10

+1 . lookup softmax regression.

(Jun 23 '12 at 04:26) rm9

You would still need to store a n_features x n_classes parameters. If n_features is already in the order of 1e6 (e.g. Bag of Words on a large text corpus) that approach will not be tractable.

(Jun 23 '12 at 12:29) ogrisel

I'm just following the normal procedure for object classification where each object is represented by a histogram of bags of words. I thought multiclass svm's was the standard procedure for this

(Jun 25 '12 at 14:35) mugetsu

It is standard but if both your number of classes and your number of input features (number of distinct possible words occurring your bags) are high, then the total size of the one-vs-all global classifier will be large: n_classes * n_features. Assuming one double precision float per parameter that means n_classes * n_features * 8 / (1024 ** 2) MB. This might not hold in memory on a single machine. Hence the hashing trick.

(Jun 27 '12 at 08:36) ogrisel

I agree with ogrisel. This approach doesn't solve the problem if the number of classes is large.

(Jul 05 '12 at 04:28) Mathieu Blondel

You should have a look at the (label x feature)-hashing trick for multiclass, multilabel and/or multitask learning:

Transform you data by applying the following:

Choose an arbitrary target dimension d (for instance d = 10^6):

For each sample i with features vector x^(i) and labels vector y^(i) (labels are expected to be zero-one encoded on n_classes binary vector, possibly using a sparse representation):

  1. For each non-zero label y^(i)_k: compute a positive hashed sample z^(i) such that the lth component if the vector z^(i)_l is computed by summing the values x^(i)_j such that l == hash(k, j) % d

    Furthermore for multitask learning, add x^(i)_j such that l == hash(j) % d to the previous vector to encode the common part.

  2. For each/some zero label y^(i)_k: compute a negative hashed sample z^(i)' in a similar way

This way you can fit a single linear binary classifier (preferably online so that you can generate the hashed samples on the fly) with a bounded number of parameters which is d (+1 for the intercept / bias). The hashing trick also preserves the sparsity of the original representation making it possible to do memory efficient implementations.

Weinberger et al's paper: Feature Hashing for Large Scale Multitask Learning is a very nice read that provides both theoretical insights and experimental results at scale.

If you want an optimized implementation of this scheme, try vowpal wabbit by the same authors.

answered Jun 23 '12 at 12:56

ogrisel's gravatar image

ogrisel
498995591

edited Jun 24 '12 at 20:04

I would consider error-correcting output-codes, as they allow you to reduce the original multi-class problem to m binary classification problems (m is chosen by you). In particular, if you set m to some number smaller than the number of classes, you are compressing the problem. If you set it to a larger number, you are adding some redundancy in the problem.

See the multi-class module in scikit-learn for a brief description and references.

answered Jul 05 '12 at 04:36

Mathieu%20Blondel's gravatar image

Mathieu Blondel
119121615

edited Jul 05 '12 at 04:45

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.