Hi all,

So Im taking andrews ML class on coursera. In the beginning of the third week - Lecture : Classification (https://class.coursera.org/ml-004/lecture/33) He then begins with a binary classification scenario.

alt text

Now looking at the figure_1 its logical to have the threshold value of .5. But then he introduces an extreme right point in the figure_2 , which causes the linear regression line to tilt rightwards. Now my question is that why are we still keeping the threshold at .5 ? Shouldn't we rather choose a threshold value of .3 now ? Why are we still taking the threshold/cutoff at .5 ? It really doesnt make any sense to me.

alt text

For example consider this extreme case : - . Clearly the green line is the classifier. And the blue line shows the threshold , which in this case should be chosen around approx .2 so as to classify properly. alt text

asked Feb 03 '14 at 16:57

Rahul%20SIngh's gravatar image

Rahul SIngh
1223


One Answer:

Because you never get to see data that nicely and pretty, this is a toy example, but usually one would have at least 3 or 4 dimensions, also, you usually do not know how much of one case you have vs the other, which would make a really difficult task to set the classifier a prior.

Also, using linear regression as a classifier is usually a very bad idea, and instead you use logistic regression, so do not bother much with this, because odds are, you are probably never going to use it.

P.S That second plot is probably not going to look like that, the line is going to be way more horizontal since your linear regression will treat those 3 points as outliers.

answered Feb 03 '14 at 18:32

Leon%20Palafox's gravatar image

Leon Palafox ♦
40857194128

It is not a bad idea in finance. OLS will give you identical results to Logistic most of the time (in terms of the decision that will come from such an analysis).

(Feb 03 '14 at 22:58) Jeremiah M
1

Linear regression for classification is often just fine and sometimes can behave better than logistic regression in separable datasets, since that makes logistic regression unstable without regularization.

The main advantage of logistic regression is that it is less sensitive to outliers than linear regression, as the log loss looks like a hinge and grows as of O(error), while the squared loss grows as of O(error^2). The log loss also tends to be better at discriminating extreme probabilities (close to 0 or 1 but not quite), which are quite difficult for the squared loss.

(Feb 04 '14 at 00:54) Alexandre Passos ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.