If we know
I suppose a reasonable objective function might be to maximize
, the sum of the entropy of X and the log-likelihood. But I don't know how to justify it.
Another problem is that it is hard to optimize for
I want to use (stochastic) gradient descend algorithm, so I tried to differentiate
A compromise is to drop the entropy term in the
Standard solution is to maximize entropy subject to relaxed moment constraints. The result is typically L1 regularized maximum likelihood See section 2.3.5 in Miroslav Dudik's thesis. If you really want to use entropy as the penalty term, you could do MCMC sampling there as well -- just note that entropy is expected value of logarithm of probability and switch order of derivative and expectation operators.