Is there a benefit to train a logistic regression classifier on data which happens on labels which happen to be in the form of probabilities already?

I know logistic regression was intended for classification, but I feel that cases where P_c1 = 51% and P_c2 = 49% should be taken into account during training.

Also, if I were to train it in such a manner, would I need to use a different training procedure from iterative reweighted least squares?

asked Oct 18 '11 at 23:37

crdrn's gravatar image

crdrn
327151825

edited Oct 18 '11 at 23:50


2 Answers:

I don't see a reason why you shouldn't use this information if you have it. Your likelihood function would look different, probably, and you'd have a different gradient. You could probably still optimize it with iteratively reweighted least squares but usually you don't want to do this anyway as things like L-BFGS are better.

answered Oct 19 '11 at 11:35

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1899744214335

So my likelihood would no longer be the binomial distribution? Is there any standard pdf when your labels are probability values?

If I treat the likelihood as normal distribution, it seems like my model will just become a neural network with no hidden layer.

(Oct 19 '11 at 22:27) crdrn

Logistic regression is one kind of neural network with no hidden layer already. You can use a nicer loss function, such as minimizing the KL divergence between the empirical distribution and your predicted distribution, which is - sum_x sum_y observed-P(Y|X) * log(predicted-P(Y|X))

(Oct 19 '11 at 22:34) Alexandre Passos ♦

Can I interpret the probabilities I get from logistic regression outputs as the probability of the class Y of a given datapoint X assuming the pdf of y given x follows a logistic distribution?

(Oct 25 '11 at 23:52) crdrn

If you have that information then you should use it, especially if your goal is to get a good estimate of the conditional probability, rather than just getting accurate classification. Most implementations of logistic regression will let you weight examples differently. You can use this feature to encode an example as having probability p of being positive. Just split it into two examples, one with weight p and a positive label and the other with weight (1-p) and a negative label.

This is an example of using "Rao-Blackwellized" or "distributional" examples, in which some of the attributes of an example are observed precisely and others are specified only up to a probability distribution.

answered Oct 21 '11 at 21:11

Ian%20Goodfellow's gravatar image

Ian Goodfellow
65581825

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.