|
I'm trying to derive the formulas for multinomial logistic regression. My problem with the derivation on Wikipedia (derivation) is that
should be
This means that the denominators are all different. For instance,
and, therefore, all the subsequent steps don't make sense. How can I find the expressions for multinomial logistic regression, then? I'll elaborate a bit on that. Let's drop the i index. We have
Why? It must be because
and
As the article says, we're "running K-1 independent binary logistic regression models" so the one above is just a single logistic regression model and Y is binary (Y=1 or Y=K). Then we have
Here we have
As you can see, we have two different P(Y=K|X). That's why we have to write P(Y=K|X,beta1) and P(Y=K|X,beta2) to tell them apart. Maybe now it is more clear what I mean. |
|
I think you might be getting hung up on notation. Each possible outcome Y=i, i=1,...,K has a separate parameter vector beta_i. So it doesn't make much sense to talk about P(Y=K|X,beta_1), since you would really need P(Y=K|X,beta_K). Furthermore, one doesn't actually need model P(Y=K|X) since we already know P(Y=K|X) = 1 - [P(Y=1|X) + ... + P(Y=K-1|X)]. If this still doesn't help I personally find the alternate derivation "As a log-linear model", which you can find right below on the Wikipedia page you linked, more intuitive. Hmm... I don't understand what you mean. I've updated my answer so now maybe it's more clear what I mean.
(May 29 '13 at 12:53)
Kiuhnm
|
OK, I think I understand what's wrong. The article says that we're running K-1 binary logistic regression models, but that's not completely true. If it were, it would be as I say in my question above. The point is that we impose that ln[(P(Y=1|X)/P(Y=K|X)] be equal to beta1 dot X (it's not something we prove, but an assumption). P(Y=1|X) and P(Y=K|X) have nothing to do with P(Y=1|X) and P(Y=0|X) of the binary logistic regression. In fact, once we find them, we can see that they are different. I took some assertions too literally.