When objective function is log-likelihood, negative of inverse of the expected Hessian divided by n gives us approximate covariance matrix of the ML estimator, are there similar results for other objective functions?

asked Aug 19 '10 at 16:19

Yaroslav%20Bulatov's gravatar image

Yaroslav Bulatov
1963193458

edited Aug 19 '10 at 16:29

1

Isn't the covariance estimate equal to just the inverse of the negative of the expected Hessian?

(Aug 19 '10 at 16:27) spinxl39

And also, since it is the expected Hessian, it doesn't need to be divided by n.

(Aug 19 '10 at 20:36) spinxl39

Where expectation is computed over what?

(Aug 19 '10 at 21:42) Yaroslav Bulatov

Over samples.

(Aug 19 '10 at 21:51) spinxl39

Suppose you are trying to learn p(x,t) where true generating distribution is q(x)=p(x,t0). Then your MLE's asymptotic variance is 1/(nH) where H is the expected Hessian of the log-likelihood function evaluated at t0, and the expectation is taken with respect to q. Sometimes people take the expectation with respect to q^n which is the distribution over sequences of n IID points drawn from q, in which case you don't need 1/n factor

(Aug 19 '10 at 22:56) Yaroslav Bulatov

One Answer:

I think it's just because that's how the covariance (which is basically the second moment) is defined. Second moment is the just Fisher information which is the same as the inverse of the negative of the expected Hessian of log-likelihood. So I think this form of the estimate is specific to MLE.

answered Aug 19 '10 at 16:40

spinxl39's gravatar image

spinxl39
3458104368

Wikipedia derivation of asymptotic normality of MLE gets the result by relying on some regularity properties of likelihood function, so it seems the same result should apply to other objective functions with those properties (for instance, objective function must decompose over training examples in order for central limit theorem to apply)

(Aug 19 '10 at 19:23) Yaroslav Bulatov

According to the Cramer–Rao bound, The variance of any unbiased estimator hat{theta} of theta is lower bounded by the inverse of the Fisher information (negative of the expected Hessian). MLE is asymptotically efficient which means that the asymptotic variance equals the inverse Fisher information which is the best possible variance.

I think any objective function which leads to an asymptotically efficient unbiased estimator will lead to a similar result (and it may require the regularity conditions for this to hold).

(Aug 19 '10 at 20:26) spinxl39
Your answer
toggle preview

Subscription:

Once you sign in you will be able to subscribe for any updates here

Tags:

×2

Asked: Aug 19 '10 at 16:19

Seen: 1,030 times

Last updated: Aug 19 '10 at 22:59

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.