|
I'm using Softmax regression for a multi-class classification problem. The prior probabilities of the classes are unequal. I know from Logistic Regression (softmax regression with 2 classes) that the prior probabilities of the classes are implicitly added to the bias (actually the term log(p0/p1) is added). Usually what I do is manually remove this term from the bias after training. My question is, what is the corresponding term in softmax regression bias? Thanks. |
|
You can add log(pi) for all i to the weight assigned by the i-th class to the bias feature. This will somewhat correct for the sampling bias. Assuming your training set was sampled uniformly and your test set will have these probabilities, this is the same thing as multiplying your predicted probabilities for each test point by the probability distribution in the test set and renormalizing. Another way to think of this is that you're implicitly weighting every training set example by the inverse of the probability of them ended in the training set (which is big if they are in rare classes and small if they are in popular classes); this weight will be reflected mostly in the bias (assuming no regularization, otherwise things can get slightly different), so adding the log of these odds to the bias corrects it. btw: why is it add and not subtract?
(Feb 02 at 07:03)
rm9
I could be wrong. Work out the math and see what leaves the expectations unchanged.
(Feb 02 at 07:48)
Alexandre Passos ♦
|
You can add log(pi) for all i to the weight assigned by the i-th class to the bias feature. This will somewhat correct for the sampling bias. Is this your question?
Yes, I think so. Can you maybe refer me to the mathematical reason behind this? thank you very much!
Assuming your training set was sampled uniformly and your test set will have these probabilities, this is the same thing as multiplying your predicted probabilities for each test point by the probability distribution in the test set and renormalizing. Another way to think of this is that you're implicitly weighting every training set example by the inverse of the probability of them ended in the training set (which is big if they are in rare classes and small if they are in popular classes); this weight will be reflected mostly in the bias (assuming no regularization, otherwise things can get slightly different), so adding the log of these odds to the bias corrects it.
thanks, I think I got it.