|
Hi all, I am using multiclass softmax for predicting multiple class.The distribution of class is not uniform.During learning, I am updating weight and bias. I am updating bias as:
P is probability here.I reach to this relation after taking first derivative to objective function with respect to bias. I am using stochastic gradient ascent for learning parameters.For uncommon class, the value of P is very small.Thus, bias gets raise in proportion to learn rate ,which ultmately results into large value, which is no expected. So what can be the reason for it? Do we really need to regularize the bias? What other ways are there for updating bias? |
|
The gradient of the multi-class cross-entropy with respect to the bias is not (1-P) if P is the vector with class predictions, but (P - y). Here y is a vector that is zero everywhere except for the element that corresponds to the correct label, which is 1. Minimizing this error (so following the negative gradient) leads to an update of b_i := b_(i-1) + learnRate*(y-P). As you can see, this will increase the value of the bias element that corresponds to the correct label and decrease the values of the others depending on how wrong they were. |