We know that, by using the Stick-Breaking construction, a dirichlet process random draws G such that: G~DP(alpha,H)

A set of random draws of G is defined by:

G=sum(pi_k*delta_theta_k) from k=1 to infinity

Where:

pi_k=beta_k*prod(1-beta_l) from l=1 to k-1

and

beta_k are random draws from a beta distribution.

delta_theta_k are points of energy located at the random draws from the base distribution H.

I have some questions:

  1. In the very first draw, the value of pi_k will be 0, since the prod is to k-1 and k is one, then rendering the product equal to 0. Is the first pi_k then equaled to the first b_k?

  2. Once we sample from our sample distribution -lets say a Gaussian with mean- our fist theta_1 may be located at 0.5, then we have a point there of magnitude pi_k. Since the draw is defined as the sumatory, by the end, after a number of k's, lets say five, do we sum over those 5 values to obtain a simple G? then, do I have 1 draw (that is a single real value)???

Thanks A lot

asked Jan 10 '11 at 23:00

Leon%20Palafox's gravatar image

Leon Palafox
31265471107


One Answer:
  1. The product term is actually equal to 1 because it has no terms. So pi_k is equal to beta_k, not 0.

  2. It's hard to understand exactly what you're asking, but G is not a single real value; it is a discrete probability distribution with an infinite number of atoms. You continue the stick-breaking process to define the pi_k values out to infinity, and for each one, you choose a location from the base distribution H. Then, G is the discrete probability distribution you obtain from "overlaying" the infinite collection of atoms. I think you might be confused because G is defined as a summation, but you don't actually sum up the real-valued locations of the atoms, you add them together as functions (technically, measures).

answered Jan 10 '11 at 23:22

Kevin%20Canini's gravatar image

Kevin Canini
12001328

Ohhhhh, now I get it, that is the reason of the location variable (delta), other way it would be a normal summation and thus the result would be a single number (right?)

But since we have this location variable given by the random draws, they do not actually sum ever, unless we had 2 identical draws in the base probability?

(Jan 11 '11 at 00:36) Leon Palafox

Yes, that's exactly right. If you can find a tutorial which shows G pictorially, that would be a really helpful learning aid. I usually have to build a mental picture when thinking about Dirichlet processes.

(Jan 11 '11 at 00:56) Kevin Canini

I did find a PPT tutorial with some graphical representations: nlp.stanford.edu/~grenager/papers/dp_2005_02_24.ppt

But in slide 15, where he exemplifies the stick-breaking, he states the prod equals 0 in the first iteration, thus my confusion.

Other than that the slides are quite nice

(Jan 11 '11 at 01:04) Leon Palafox

Yeah, I see that (on slide 13). That's definitely a typo.

The red spikes on the bottom of that same slide are the atoms that characterize the discrete distribution G. If you were to plot the probability mass function of G, it would look exactly like that.

(Jan 11 '11 at 01:22) Kevin Canini
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.