2
1

I have a simple task: given a set of features for an item, predict the probability that the item belongs to a particular class. The training set is binary: an item is either in the particular class or not. The number of features is fairly small - 15 to 30.

A more complex version would predict which of a number of input items is most likely to belong to the class (ie. only one of these items is what we're looking for, predict which it is).

I expect the classification will be simple - I'm guessing a perceptron would work just fine.

I have a fair bit of experience with neural nets and have dabbled with a few other techniques, but wanted to get the pulse of what techniques people are using these days before jumping in.

I'm looking for simplicity, accuracy, and classification speed. What do you recommend?

[Update] Great responses, exactly what I was hoping for. So it seems SVMs are the new hotness, and I've been meaning to play with them for some years anyway, so maybe I'll give it a try. My target language is Python, and one particular consideration is having a pure python classification (as opposed to training) implementation, to allow for, for example, a Google App Engine runtime implementation. I'll do some digging and see if I can find a suitable implementation.

asked Jul 17 '10 at 12:38

Dogma's gravatar image

Dogma
36125

edited Jul 18 '10 at 10:50

1

I wouldn't recommend you to use perceptron. Usually its training speed is very low. If you can, use SVM instead. While percepton finds just some separating hyperplane, SVM finds the best (in some sense) separating hyperplane.

(Jul 17 '10 at 14:46) bijey

Just curious, what did you endup using on App Engine for pure python implementation? Because sklearn uses liblinear/scipy (both of which obviously won't work on App Engine).

(May 21 '13 at 14:23) Ricky Wong

3 Answers:

I agree with Alexandre Passos: using SVM or Logistic Regression (with L1 or L2 regularization) will both improve the generalization error over a perceptron (thanks to the regularization term) and are also able to provide you with probability estimates (which is obvious for LR but less so for SVMs).

Also don't implement it your-self: liblinear and libsvm are available under BSD license with bindings for a variety of languages or even port for java for instance.

If you prefer python scikits.learn has wrappers for libsvm and liblinear that allow you to predict probabilities with a simple and consistent API accros models, for instance see this example.

answered Jul 18 '10 at 07:09

ogrisel's gravatar image

ogrisel
498995591

edited Jul 18 '10 at 11:11

Simplicity and speed: logistic regression.

If you want more accuracy than that, you could try using a neural network with sigmoid/softmax outputs or an SVM modified for probability elicitation (libsvm supports it).

answered Jul 17 '10 at 13:36

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

I think that using simple Linear SVM's over logistic regression wont be a bad idea. The complexity order would be the same as LR but the number of points is greatly reduced because Linear SVM's will leave out a lot of irrelavant points. Check out this basic implementation here... http://www.kyb.tuebingen.mpg.de/bs/people/chapelle/primal/ Here the machine goes from optimizing a loss from 3000 points to <1000 points....

answered Jul 18 '10 at 08:48

kpx's gravatar image

kpx
541182636

edited Jul 18 '10 at 08:51

1

If the SVM is linear, the complexity of inference is exactly the same as with logistic regression (i.e., a biased dot product). The linear SVM solution would just not depend on the correctly classified points, but regularized logistic regression also has some margin effects that guarantee generalization. There is no "number of points" as part of a linear SVM (but there is as part of a kernelized SVM).

(Jul 18 '10 at 09:04) Alexandre Passos ♦

hey Alexandre, I wanted to discuss a little further here.

I was reading a paper on designing SVM's in the primal where in the paper and the MATLAB implementation actually just consider the loss for the points where y(w.x)<1 (Even for the linear SVM's.) that is the support vectors, by which I meant the "number of points". Please correct me if I am wrong here again or you think If I am reading the MATLAB implementation of this paper incorrectly. But I feel pretty sure that that is what is done here.

Please see the implementation corresponding to the paper "Training a Support Vector Machine in the Primal" Code in question is on Line# 168 thru 172 and 101,102.

(Jul 18 '10 at 14:06) kpx

Yes, I really like that "training a support vector machine in the primal" paper. Another, similar, faster method is Pegasos: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.161.9629&rep=rep1&type=pdf and http://astro.temple.edu/~tua63862/ICML10_BPegasos.pdf .

Yes it's true that the support vector machine loss only depends on the margin errors (the points for which y(w·x+b) < 1), but this doesn't necessarily make it more stable or faster, in the linear case, since you still have to run the algorithm on all points (to classify them and see if they are margin errors) and your final feature vector will have a fixed dimension. Most of the gain in generalisation performance obtained by using an SVM is due to the strong regularization built in the method. Unregularized kernel classifiers would be too unreliable to work, for example, but in SVMs kernels are very practical and have decent generalization performance.

So, yes, the algorithm will focus on the margin errors and optimize around that, but it still has to check all points every once in a while (to see if they became margin errors).

(Jul 18 '10 at 14:30) Alexandre Passos ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.