i know the question is so big if without aiming for actual project . for my question : i just do the remote sensing image classification , i use the object-oriented method : first i segmented the image to different regions , then i extract the features from regions such as color,shape and texture .and the number of all features in a region may 30, and commonly there are 2000 regions in all ,and i will choose 5 classes with 15 samples for every class. so in sum : sample data 1530 ; test data 197530; now i faced how to choose the proper classifier to implement the classification , if there are 3 classifier : ANN ,SVM,KNN .which should i choose for the better classification ?

asked Sep 06 '11 at 03:57

lpvoid's gravatar image


2 Answers:

I agree with Olivier in characterizing the methods, though I'd start with KNN. KNN is more or less trivial to tune and just works. It might not give as good results as an SVM with Gaussian kernel but I feel it gives a good baseline/starting point in judging your data and your features.

This answer is marked "community wiki".

answered Sep 06 '11 at 05:20

Andreas%20Mueller's gravatar image

Andreas Mueller

yes , knn is easy to build and to use , its classification result may good via tuning ,then a question is : in the parameter tuning progress , which classifier is more reliable ?

(Sep 06 '11 at 05:29) lpvoid

True: KNN is simple and will give a good measure of the regularity of your dataset. However KNN is not trying to do any kind of generalization / summarization: the model is a copy of the training set. So IMHO you should use it as a baseline sanity check for more powerful models such as SVM.

(Sep 06 '11 at 05:47) ogrisel

so in your opinion knn can be a test for assuring the sample datas and features are proper ? if so , i can go to implement classification with other model as SVM, but in SVM, may i also must find the best kernel and the parameters .

(Sep 06 '11 at 07:09) lpvoid

Yes. As for the kernel, if the data is very high dimensional (many features) and you have fewer samples that features, linear kernel is the best (the data is likely to be linearly separable). Otherwise (for non linear data) gaussian kernel is probably a general purpose kernel (in my experience polynomial can fail without giving much intuition on why they fail).

(Sep 06 '11 at 08:11) ogrisel

in my project the features :sample num = 30:15 = 2,the dimension is not big .gaussian kernel is specially for high dimension but why you said linear kernel is the best?

(Sep 06 '11 at 08:51) lpvoid

Try both using cross validation as explained in the libsvm guide.

(Sep 06 '11 at 09:37) ogrisel
showing 5 of 6 show all

You should first explode the categorical values (color, shape, ) into boolean features as explained in the answer to a question on multivariate clustering. Then scale the feature values so that they all have the same variance.

Then any of ANN, KNN and SVM should work (unless your data is just noise). SVM are probably easier to tune (only 2 simple parameters) than ANN and are better able to generalize than KNN (which is just a "database" with similarity queries).

Hence try SVM with a gaussian kernel and do a grid search of the parameters C and gamma as explained in the libsvm guide.

answered Sep 06 '11 at 04:52

ogrisel's gravatar image


why explode data into boolean features? just 1 or -1? or do you mean normalize the features to (-1,1)? in fact ,the data pretreatment is not difficult . my focus is the classifier choosing .

(Sep 06 '11 at 05:18) lpvoid

You should only explode categorical features as sets of boolean features. Numerical features can be scaled directly.

Most machine learning models will work by computing pairwise distances between vector samples: for instance this is the case for both SVM with gaussian kernel and KNN.

Now assume that you use integers to encode your categories, for instance for the color feature: 1 for red, 2 for blue, 3 for yellow, 4 for green and so on.

Now assume using this coding the machine learning algorithm will "think" that red and blue samples are much more similar than red and green for instance since ||1 - 2|| = 1 while ||1 - 4|| = 3. This is completely misleading as the integer coding for the categories was completely arbitrary. If you code with sets of boolean features as explained earlier the distance between 2 samples with 2 different color will always be 1: you tell the algorithm that what's important is whether the samples have the same color and not the inner value of the color.

Using integer coding for categories also has bad interaction with the regularizer term of SVM and the weight decay mechanism of ANN for similar reasons.

You say: "the data pretreatment is not difficult. my focus is the classifier choosing" => this is probably the opposite: the data preparation is probably the most important step. If you get it wrong will get crappy results whatever the classifier. The nature of your classifier is probably not as important w.r.t. the results quality (as all the classifiers you mentions are able to deal with non linearly separable data).

(Sep 06 '11 at 05:39) ogrisel

As for the variance scaling of the features, just follows the instructions of the libsvm guide: it's very intuitive and give you tools to do it.

(Sep 06 '11 at 05:40) ogrisel

Actually I misread the question and thought you add categorical features for shape and color while you probably already have meaningful numerical coding for those (it depends on your feature extraction layer). If this is the case you don't need have to do the boolean exploding step.

(Sep 06 '11 at 06:00) ogrisel

but "SVM are probably easier to tune (only 2 simple parameters) than ANN and are better able to generalize than KNN (which is just a "database" with similarity queries)."the difference between SVM and ANN just the tuning ?

(Sep 06 '11 at 07:04) lpvoid

Feed forward perceptrons with one hidden layer with non linear link function trained with backpropagation are universal function approximators as are SVM with gaussian kernel. However in practice one must be very careful when implementing them and the you can tune more parameters: number of nodes on the hidden layer, shape of the link function on the hidden and output layers, learning rate, strength of the weight decay and the moment...

Efficient Backprop by LeCun, L. Bottou, G. Orr and K. Muller is probably the best practical guide to implement backpropagation of multi layer perceptrons correctly.

There also exists a new breed of neural network architectures that are able to do semi-supervised learning and outperform SVMs with gaussian kernel on many tasks but they are even more complicated to get right. If you are interested, you should read the literature on Deep Learning (Deep Belief Networks and Stacked Denoising Autoencoders).

Also for image recognition tasks convolutional neural networks might work great too.

A good place to start for learning about all of those beasts is deeplearning.net.

(Sep 06 '11 at 08:25) ogrisel

An additional point that I think is important is: Neural networks need non-convex optimization, which is sensitive to many parameters and in a way one does never quite know how good a solution you will get. SVMs use convex optimization, which is much more robust and is guaranteed to find the correct solution (approximately).

On the other hand, recall speed in ANNs only depends on the model size and is usually much faster than in (kernelized) SVMs, where recall speed scales approximately quadratic with the number of training examples.

As your dataset is not to big, I'd go with an of-the-shelf SVM package like LibSVM or even better use Python and scikits-learn :)

(Sep 06 '11 at 09:35) Andreas Mueller

yes , i will use libsvm to implement my classification .

(Sep 07 '11 at 23:27) lpvoid
showing 5 of 8 show all
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.