1
2

I'm using Softmax regression for a multi-class classification problem. The prior probabilities of the classes are unequal.

I know from Logistic Regression (softmax regression with 2 classes) that the prior probabilities of the classes are implicitly added to the bias (actually the term log(p0/p1) is added).

Usually what I do is manually remove this term from the bias after training.

My question is, what is the corresponding term in softmax regression bias?

Thanks.

asked Feb 01 at 11:35

rm9's gravatar image

rm9
1111811

1

You can add log(pi) for all i to the weight assigned by the i-th class to the bias feature. This will somewhat correct for the sampling bias. Is this your question?

(Feb 01 at 11:43) Alexandre Passos ♦

Yes, I think so. Can you maybe refer me to the mathematical reason behind this? thank you very much!

(Feb 01 at 11:45) rm9
1

Assuming your training set was sampled uniformly and your test set will have these probabilities, this is the same thing as multiplying your predicted probabilities for each test point by the probability distribution in the test set and renormalizing. Another way to think of this is that you're implicitly weighting every training set example by the inverse of the probability of them ended in the training set (which is big if they are in rare classes and small if they are in popular classes); this weight will be reflected mostly in the bias (assuming no regularization, otherwise things can get slightly different), so adding the log of these odds to the bias corrects it.

(Feb 01 at 11:48) Alexandre Passos ♦

thanks, I think I got it.

(Feb 01 at 11:53) rm9

One Answer:

You can add log(pi) for all i to the weight assigned by the i-th class to the bias feature. This will somewhat correct for the sampling bias. Assuming your training set was sampled uniformly and your test set will have these probabilities, this is the same thing as multiplying your predicted probabilities for each test point by the probability distribution in the test set and renormalizing. Another way to think of this is that you're implicitly weighting every training set example by the inverse of the probability of them ended in the training set (which is big if they are in rare classes and small if they are in popular classes); this weight will be reflected mostly in the bias (assuming no regularization, otherwise things can get slightly different), so adding the log of these odds to the bias corrects it.

answered Feb 01 at 11:59

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1899744214335

btw: why is it add and not subtract?

(Feb 02 at 07:03) rm9

I could be wrong. Work out the math and see what leaves the expectations unchanged.

(Feb 02 at 07:48) Alexandre Passos ♦
Your answer
toggle preview

Subscription:

Once you sign in you will be able to subscribe for any updates here

Tags:

×2

Asked: Feb 01 at 11:35

Seen: 528 times

Last updated: Feb 02 at 07:48

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.