|
Hello everyone, here's a problem that has been bothering me for quite some time now. When doing regression one can adopt a laplace prior on the weights W of the regression in order to encourage sparsity. However, directly working with the laplacian prior makes things awkward as MAP estimation of the weights W is non-linear. A common solution to this problem (e.g. Figueirido: Adaptive sparseness for supervised learning) is to adopt a hierarchical view of the laplacian prior:
Now we can integrate out the variance Ti and obtain a laplacian prior on Wi:
The S above is meant to be integration from zero to infinity. Map estimation can now be performed using an EM algorithm. My question: how is this integral worked out? I've tried to follow references, up to (very challenging for me) papers that introduced this representation based on gaussian scale mixtures. Could somebody help? Thanks in advance! N. |
|
This paper seems to explain it (as a special case of a more general technique), but I haven't worked through the math myself. Hi Kevin, I know of this paper, but I have been wondering whether there is some other reference that is a bit less technical than that. Thanks, N.
(Jun 02 '11 at 06:48)
Nikos G
|
I'm not sure, but you might try looking into the conjugate Gamma Distrib and Gaussian Distrib, which is what they seem to be using there.
Hi Leon, thanks for your comment.
There are two reasons why I would like to use this particular prior:
1) I've had good success with it in the past and
2) there is only one parameter, k above, that needs to be set while in the gaussian-gamma method, there are two (the parameters of the gamma prior) and I've never had much success in setting them to good values. Perhaps there are some good "receipes" here that I ignore, if so I would be grateful to hear them.
Thanks, Nikos
Yeah, that's what I meant, your prior looks like a gamma distribution, and if that's so, then the multiplication of them might be another normal (not sure about that, though) due to the conjugate properties.
Hello Leon, thanks for your interest.
It is true, the exponential is just a special case of the gamma distribution. The gamma is the conjugate prior for the precision parameter. However, in this hierarchical laplace prior, the exponential prior is set on the variance and therefore the conjugate property is lost.
Thanks again, N.