|
This is a simple problem but I don't have a satisfactory answer yet. Here is the problem : D is an unknown probability density function over 1-dimensional real values. P( E[X] < 0 | X_1, X_2, ... X_n ) I have the answers in some particular cases, but I'm interested in the general case.
I figured that the general case would use a Bayesian nonparametric approach. However, I don't know much about this topic. So here are my questions :
|
|
First of all, regarding your solution for particular case #1:
Fitting a Dirichet distribution to an observed dataset is not nonparametric. In this case, the Dirichlet distribution would have three parameters, so it is a parametric solution. I think you're getting confused between the Dirichlet distribution and the Dirichlet process, which is nonparametric. Regarding your question, the first thing to note is that it is going to be extremely difficult to come to any definite conclusions about E[X] without making any distributional assumptions. The expected value of any distribution can be made arbitrarily large (or small) by simply adding some probability mass at a sufficiently large (or small) value, but taking care that that probability mass is small enough that it's unlikely to be observed given a finite sample of n datapoints. Also, I would first point out that, by far, the most commonly accepted practice for solving this problem is the t-test (which is not a Bayesian method, however). Of course, there are multiple nonparametric Bayesian methods for solving this problem. If your data are discrete-valued (not continuous), then a Dirichlet process (or a Pitman-Yor process) is the most popular way of performing nonparametric Bayesian density estimation. If your data are continuous, and if assume that D is a mixture distribution (e.g., a mixture of Gaussians), then a Dirichlet process mixture model is an appropriate choice. If neither of these assumptions hold (your data is continuous and D is not necessarily a mixture of parametric distributions), then a Gaussian process-based solution seems like the natural solution. A quick search revealed this paper, which introduces a density estimation model based on the Gaussian process. I haven't read it myself, but it seems like it would provide a solution to your problem. http://arxiv.org/pdf/0912.4896v1 |
|
First of all, many thanks for your answer :) About the usage of "non-parametric" (off-topic)I'll precise that the difference between Dirichlet Distribution and Dirichlet process does not confuse me. However, I must say that the usage of the non-parametric term does confuse me and it also seems to confuse Wikipedia which have two different meanings attached to it. In the case where we know that the support for D is {-1,0,-1} (i.e. it is not an assumption), I thought that the usage of the Dirichlet distribution was a non-parametric situation since there is no assumptions to be made on the model. Of course, there are parameters governing the prior, but isn't the case for the gaussian process too ? Gaussian Process and GPDSI've just read the first 5 chapters of Rasmussen's book. And wow ! Gaussian processes are amazing and so is the quality of this book. However, the book does not describe how to model probability densities (which have non-conjugate priors). To address this problem, Ryan Prescott propose GPDS [NIPS-2008, archive-2009], a MCMC approach to sample from the expected distribution (if I understood correctly). It doesn't solve my problem :(Unfortunately, sampling from the expected distribution doesn't solve my problem. It would only allow me to answer the following question : E_{D'}[X] < 0 , where the expectation is taken over D' and D' is the expected distribution. But I'm interested in Pr( E_{D}[X] < 0 | X_1, X_2, ... X_n ). This time the expectation is done over D and D is sampled from the Bayesian Posterior. To this point, I think I'm better off with the Student's t-distribution which provide a Bayesian posterior over the values of E[X]. Can any body enlighten me on the worst cases of this approach? Of course when n is small it might give a wrong answer. But what would be the worst distribution ? |