I'm playing with a hierarchical Bayesian setup in which theta is a vector of model parameters with a Gaussian prior distribution. S is the covariance of that Gaussian (let the mean be zero for now) and has an inverse-Wishart prior. I cannot directly sample from theta and am using the metropolis algorithm to do so. How would I obtain samples of S that are conditioned on the data while I only have a single vector theta available after each metropolis step. Does one normally use all the vectors that have been sampled already to obtain samples for S? Is it possible at all?

Implementations that assume a Gaussian prior with spherical covariance I*sigma^2 seem to use the variance within the vector theta itself to compute the conditional inverse-gamma distribution. In my situation this is obviously not possible as I cannot compute the covariance matrix over a single vector.

EDIT: I realize that my question was a bit vague and I think I understand things a bit better after some thinking so I will give a bit more detail. I'm using metropolis sampling to sample vectors of parameters theta conditioned on the data (p(theta|X)) using a Gaussian prior. So the likelihood I evaluate at every sampling step is p(X|theta)p(theta|mu, S) (actually I'm not 100% sure I'm doing the right thing by including the prior in the sampling like that...). Now I want to place a prior on S as well in the form of an inverse-Wishart distribution. Based on other setups I've seen, it should be possible to use Gibbs sampling for this where given some value of theta I sample a value of S from the inverse-Wishart and use this to find a new value of theta using metropolis. The problem is of course that the covariance matrix cannot be determined from just theta but requires p(theta|X) itself.

These are the things I'm not sure about (In order of personal importance...):

-Does it make sense to run metropolis until convergence, compute the sample covariance, sample S from the IW distribution and run metropolis again using this new value of S? This would be some sort of metropolis within Gibbs with extremely sparse Gibbs updates.

-Is there a more efficient alternative to the above?

This one is more general:

-I know S can be integrated out replacing the Gaussian/inverse-Wishart combination by a multivariate Students T distribution but I didn't consider doing this yet because I'm actually interested in the distribution over S as well. Is this the reason why Gibbs sampling on priors is used in general or is it only done when it is not possible to integrate their parameters out?

asked Dec 26 '10 at 14:32

Philemon%20Brakel's gravatar image

Philemon Brakel
153092244

edited Dec 28 '10 at 04:25

I'm not sure I understand. Shouldn't you sample S only after you have samples for all thetas? If your model only has a single theta, does it even matter what is the value of S?

(Dec 27 '10 at 22:09) Alexandre Passos ♦

I just edited my question because I realized it is a bit unclear. Theta is actually the set of parameters I'm trying to sample from so at each metropolis transition I have only one vector theta available but after sampling I should have an approximation of p(theta|X). The prior is over this distribution and not the elements of a single theta vector.

(Dec 28 '10 at 04:22) Philemon Brakel

One Answer:

I suspect your model would be more easily estimated with a collapsed Gibbs Sampler.

http://www.jstor.org/pss/2290921

http://www.umiacs.umd.edu/~resnik/pubs/gibbs.pdf

answered Dec 27 '10 at 16:33

zaxtax's gravatar image

zaxtax
75191934

Thanks for the suggestion. Would this entail integrating out S to obtain a student's T distribution? I'm actually interested in the distribution over S as well but perhaps this is not the reason you bring up collapsed Gibbs sampling.

(Dec 28 '10 at 04:37) Philemon Brakel

The idea is usually you can estimate S by taking the covariance of your samples. Gibbs sampling lets you get great estimates of your distribution without having to care about approximate parameters you don't care about. I think once you have good estimates of theta, you can use those to calculate S.

(Dec 28 '10 at 04:52) zaxtax

That sounds interesting. So you say I might be able to compute the distribution over S given the prior and the samples of theta in closed form? Or do I have to re-estimate theta after sampling a new S and repeat that scheme still? The latter is what I have as most probable solution now where a whole series of metropolis updates functions as a single Gibbs sampling step that samples theta given the data and S.

(Dec 28 '10 at 05:06) Philemon Brakel
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.