I would like to compute the expected value of a function f(x1,x2): R^n x R^n -> [0,1] for a fixed x1, where f has the form f(x1,x2) = |f2(x1) - f2(x2)| for some f2:R^n -> [0,1]. This will be done for thousands of different values of x1 in a context where speed is important. Three problems:

  1. f could be slow to compute
  2. n is relatively large (> 100)
  3. the input space is continuous

I can see a few approaches:

  1. Discretize the input space for x2 and exhaustively compute f (bad idea since f is slow and n is large)
  2. Perform dimensionality reduction and then compute the expected value on that (eh...)
  3. Learn the expected value through regression (this feels dirty)
  4. Use some form of sampling

Any other ideas? Could someone familiar with the sampling literature recommend papers that deal with this kind of problem?

asked May 02 '11 at 10:30

yeastwars's gravatar image


edited May 02 '11 at 10:37

One Answer:

I think the only reasonable approach is 4. 1, as you mentioned, would be too slow. For 2 and 3 you have no guarantee of convergence, or even that the approach is sensible. There are many sampling methods, and a good start to know some of their properties and when can you use them is the wikipedia page for Monte Carlo integration. Regardless of the specific method you should be able to bound the variance of the monte carlo estimate of the expected value you're after, and this is your main way of making sure that you have the right value.

Which approach to use depends a lot on what you know about the probability distribution over which you want to compute expectations. From strongest to weakest assumptions:

  1. If you can sample from it in a reasonable amount of time, you can just use the empirical mean of the function applied to the samples as your estimate, and use the known formulas for the variance of this estimator to bound things until your confidence interval is narrow enough.

  2. If you can't sample from the distribution but you can evaluate the (possibly unnormalized) likelihood of any given point and sample from a reasonably similar helper distribution of it you can use importance sampling, which is essentially the method I specified above but with the samples from the helper distribution weighted by the ratio of their likelihood according to the actual distribution and their likelihood in the sampling distribution. This has some failure modes, but precise variance bounds can be stablished that will help you know when to stop sampling.

  3. If you can only evaluate the (possibly unnormalized) likelihood of any given point but wouldn't dare claiming that any known distribution is a good approximation of the one you're after, you can use Markov chain Monte Carlo (MCMC) techniques to create a markov chain such that its stationary distribution is the one you're after; hence you can get samples from the distribution you want to sample from and use them to compute expectations. There are many methods to do this.

A good introductory reference on this subject is chapters 29 and 30 of David MacKay's Information Theory, Inference, and Learning Algorithms book, which you can find for free on the internet. You can download individual chapters here.

answered May 02 '11 at 11:18

Alexandre%20Passos's gravatar image

Alexandre Passos ♦

edited May 02 '11 at 11:24

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.