0
1

Sampling in the Indian Buffet Process is usually described in terms of Gibbs sampling the individual feature assignments z_nk conditioned on all other feature assignments. For this, one imagines the row 'n' to be the last customer and then samples z_nk based on the popularity of feature k -- if k is an already instantiated feature. For the number of new features, one additionally samples from a Poisson distribution that depends both on the concentration parameter and the number of previous customers.

Now, imagine that one wants to use blocked Gibbs sampling. One selects a subset of customers and wants to simultaneously sample new values for a subset of their feature assignments. Say, select customer i and costumer j and sample feature assignments z_ik and z_jl as well as their number of new features z_i+ and z_j+. For this, one needs to define the joint probability p(z_ik, z_jl, z_i+, z_j+ | z_rest) conditioned on the rest of the variables.

Does anyone know of a paper that makes use of this kind of blocked Gibbs sampling?

asked Jan 18 at 06:13

daj's gravatar image

daj
21136

edited Jan 18 at 16:09

One issue with exactly the kind of sampling you're talking about (Gibbs, as opposed to the Metropolis-Hastings approach in the reference David provided) is that for actual Gibbs sampling you need to enumerate all possible states, compute their unnormalized log-likelihoods, normalize, and sample. If you have n users in your subset and k features, as each feature can be on or off for each user, and each user can have all configurations, you have something like (2^k)^n different states, which is a number that gets really big really fast.

(Jan 19 at 07:36) Alexandre Passos ♦

That's true. However, in my case the data likelihood p(X|Z) is defined in a such a way, that the total number of '1's in Z must stay the same (think of it as a combinatorial problem of finding the right one-to-one associations of data points to the features in Z, where several data points can also be clustered together and be a single 'customer' in the IBP). So only one of the 'z' considered in the blocked Gibbs sampling step can be one (because otherwise p(X|Z)p(Z) would be zero, though the prior p(Z) would of course allow any number of '1's). The complexity therefore is linear in the number of the few 'z' considered in the blocked Gibbs sampling step. The problem is defining the probability over several z in this case, which is kind of non-standard. Probably, it would be much easier in a stick-breaking representation, because then the Bernoulli probability of each column is explicitly represented, making all the z_nk independent.

Anyway, for my particular problem I found a way to sidestep this issue.

(Jan 19 at 08:09) daj

One Answer:

This paper on an IBP-based model describes something similar (using Metropolis-Hastings proposals for block moves)

Modeling Dyadic Data with Binary Latent Factors - Edward Meeds, Zoubin Ghahramani, Radford Neal, Sam Roweis

(see section 3.3 - Faster mixing transition proposals)

answered Jan 18 at 10:50

David%20Andrzejewski's gravatar image

David Andrzejewski
17635

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.