I don't understand the notion of Logistic Regression, e.g., in Charles Elkan's lecture. As I see, Logistic Regression is a Log-linear model, a generalization of linear models. But why it is defined as a conditional probability distribution?

Here are some details:

  1. Logistic Regression model is a well-defined probability distribution, that's no problem. But why say "it gives well-calibrated probabilities"? You can define any probability distribution satisfying some constrains, but how can you say your version is reasonable?

  2. If it is a reasonable probability distribution, what is the sample space of this probability measure?

  3. Why it is a conditional probability distribution? Here we observed some training data (x_i, y_i). If you say yi s are conditional independent given xi, then maybe we can use such a conditional probability distribution for convenience. But if the training pair(xi, yi) are iid, then it also might be seen as a joint distribution (I'm not really sure).

Thanks.

asked Nov 06 '12 at 09:59

Huijia%20Wu's gravatar image

Huijia Wu
51131518


2 Answers:

1) I think he is distinguishing logistic regression from other classifiers which do not have a probabilistic interpretation. eg perceptron,or you could presumably just use linear regression and a threshold...but this wouldn't give you a probability distribution ( might get negative numbers or numbers greater than 1- but still works fine as classifier).

2) the sample space is {0,1} ( for fixed xin R^n}

3) that is what we want our output to be - the probability of binary outcome y_i=1 given real data x_i.

answered Nov 12 '12 at 11:50

SeanV's gravatar image

SeanV
33629

  1. "Calibrated" probabilities means that if you take a set of points and the average probability predicted on them for logistic regression is p, then if you look at the actual events they should occur with frequency p. That is, if the classifier predicts 80% probability of the points being in the positive class then 80% of them should be in the positive class. Note that things like naive bayes do not produce calibrated probabilities.
  2. For binary classification it assigns probabilities over [-1, 1] (that is, there is a probability for the event -1 and another for the event 1, and they sum to 1). I'm not sure I understood this question.
  3. Because the predicted probability depends (is conditioned on) on some features of the data. That is, it is predicting probabilities for y_i conditioned on some value x_i, P(Y=y_i | X=x_i), which is a conditional probability distribution.

answered Nov 06 '12 at 10:47

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Thanks but what I ask is why Logistic Regression can provide "calibrated" probabilities through the functional form of conditional probability distribution. I have already known how Logistic Regression works. What I want to know is why it works from the perspective of probability.

(Nov 07 '12 at 23:18) Huijia Wu
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.