This is a simple problem but I don't have a satisfactory answer yet. Here is the problem :

D is an unknown probability density function over 1-dimensional real values.
X_1, X_2, ... X_n are i.i.d. samples from D and I'm interrested in the following question :

P( E[X] < 0 | X_1, X_2, ... X_n )
Where E[X] is the expected value of D.

I have the answers in some particular cases, but I'm interested in the general case.

  1. If D is a distribution over {-1,0,1}, I use the Dirichlet distribution and have a non-parametric solution.
  2. In the case where the normal distribution is a valid assumption, I use the cumulative student's t-distribution.

I figured that the general case would use a Bayesian nonparametric approach. However, I don't know much about this topic. So here are my questions :

  1. Is Bayesian non-parametric the right approach for this problem ?
  2. Which method should I use ? Dirichlet process of gaussian mixtures ?
  3. Is there code (preferably in c, c++ or python) that I could use to efficiently compute that ?

asked Nov 25 '11 at 15:15

Alexandre%20Lacoste's gravatar image

Alexandre Lacoste
46235

edited Nov 25 '11 at 15:18


2 Answers:

First of all, regarding your solution for particular case #1:

If D is a distribution over {-1,0,1}, I use the Dirichlet distribution and have a non-parametric solution.

Fitting a Dirichet distribution to an observed dataset is not nonparametric. In this case, the Dirichlet distribution would have three parameters, so it is a parametric solution. I think you're getting confused between the Dirichlet distribution and the Dirichlet process, which is nonparametric.

Regarding your question, the first thing to note is that it is going to be extremely difficult to come to any definite conclusions about E[X] without making any distributional assumptions. The expected value of any distribution can be made arbitrarily large (or small) by simply adding some probability mass at a sufficiently large (or small) value, but taking care that that probability mass is small enough that it's unlikely to be observed given a finite sample of n datapoints.

Also, I would first point out that, by far, the most commonly accepted practice for solving this problem is the t-test (which is not a Bayesian method, however).

Of course, there are multiple nonparametric Bayesian methods for solving this problem. If your data are discrete-valued (not continuous), then a Dirichlet process (or a Pitman-Yor process) is the most popular way of performing nonparametric Bayesian density estimation.

If your data are continuous, and if assume that D is a mixture distribution (e.g., a mixture of Gaussians), then a Dirichlet process mixture model is an appropriate choice.

If neither of these assumptions hold (your data is continuous and D is not necessarily a mixture of parametric distributions), then a Gaussian process-based solution seems like the natural solution. A quick search revealed this paper, which introduces a density estimation model based on the Gaussian process. I haven't read it myself, but it seems like it would provide a solution to your problem. http://arxiv.org/pdf/0912.4896v1

answered Nov 26 '11 at 00:07

Kevin%20Canini's gravatar image

Kevin Canini
12001328

edited Nov 26 '11 at 00:10

First of all, many thanks for your answer :)

About the usage of "non-parametric" (off-topic)

I'll precise that the difference between Dirichlet Distribution and Dirichlet process does not confuse me. However, I must say that the usage of the non-parametric term does confuse me and it also seems to confuse Wikipedia which have two different meanings attached to it. In the case where we know that the support for D is {-1,0,-1} (i.e. it is not an assumption), I thought that the usage of the Dirichlet distribution was a non-parametric situation since there is no assumptions to be made on the model. Of course, there are parameters governing the prior, but isn't the case for the gaussian process too ?

Gaussian Process and GPDS

I've just read the first 5 chapters of Rasmussen's book. And wow ! Gaussian processes are amazing and so is the quality of this book. However, the book does not describe how to model probability densities (which have non-conjugate priors). To address this problem, Ryan Prescott propose GPDS [NIPS-2008, archive-2009], a MCMC approach to sample from the expected distribution (if I understood correctly).

It doesn't solve my problem :(

Unfortunately, sampling from the expected distribution doesn't solve my problem. It would only allow me to answer the following question : E_{D'}[X] < 0 , where the expectation is taken over D' and D' is the expected distribution. But I'm interested in Pr( E_{D}[X] < 0 | X_1, X_2, ... X_n ). This time the expectation is done over D and D is sampled from the Bayesian Posterior.

To this point, I think I'm better off with the Student's t-distribution which provide a Bayesian posterior over the values of E[X]. Can any body enlighten me on the worst cases of this approach? Of course when n is small it might give a wrong answer. But what would be the worst distribution ?

answered Dec 01 '11 at 15:51

Alexandre%20Lacoste's gravatar image

Alexandre Lacoste
46235

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.