|
Hi all, So Im taking andrews ML class on coursera. In the beginning of the third week - Lecture : Classification (https://class.coursera.org/ml-004/lecture/33) He then begins with a binary classification scenario.
Now looking at the figure_1 its logical to have the threshold value of .5. But then he introduces an extreme right point in the figure_2 , which causes the linear regression line to tilt rightwards. Now my question is that why are we still keeping the threshold at .5 ? Shouldn't we rather choose a threshold value of .3 now ? Why are we still taking the threshold/cutoff at .5 ? It really doesnt make any sense to me.
For example consider this extreme case : - . Clearly the green line is the classifier. And the blue line shows the threshold , which in this case should be chosen around approx .2 so as to classify properly. |
|
Because you never get to see data that nicely and pretty, this is a toy example, but usually one would have at least 3 or 4 dimensions, also, you usually do not know how much of one case you have vs the other, which would make a really difficult task to set the classifier a prior. Also, using linear regression as a classifier is usually a very bad idea, and instead you use logistic regression, so do not bother much with this, because odds are, you are probably never going to use it. P.S That second plot is probably not going to look like that, the line is going to be way more horizontal since your linear regression will treat those 3 points as outliers. It is not a bad idea in finance. OLS will give you identical results to Logistic most of the time (in terms of the decision that will come from such an analysis).
(Feb 03 '14 at 22:58)
Jeremiah M
1
Linear regression for classification is often just fine and sometimes can behave better than logistic regression in separable datasets, since that makes logistic regression unstable without regularization. The main advantage of logistic regression is that it is less sensitive to outliers than linear regression, as the log loss looks like a hinge and grows as of O(error), while the squared loss grows as of O(error^2). The log loss also tends to be better at discriminating extreme probabilities (close to 0 or 1 but not quite), which are quite difficult for the squared loss.
(Feb 04 '14 at 00:54)
Alexandre Passos ♦
|


