I was reading Ch. 3, "Inversion within the Probabilistic Framework", of the book "Bayesian Approach to Inverse Problems". I was wondering why the the direct conditional density is given as a function of the noise. That is,

p(y|x) = q(y - A(x)),

where the inverse problem we are aiming to solve is,

y = A(x) + b,

with y being the observed values, A the operator describing the system, and b the additive noise.

More importantly, I don't understand intuitively why the solution to the inverse problem is computed by employing the direct distribution (which becomes the likelihood when y is known, which is the case in an inverse problem). Using maximum likelihood estimation, the solution is computed as follows,

x* = argmax_x p(y|x).

asked Nov 18 '11 at 02:13

Pardis's gravatar image

Pardis
26238

edited Nov 19 '11 at 11:19

1

This looks like P(b|x,y), not P(y|x). I'm not familiar with this book, so I can't help more.

(Nov 18 '11 at 07:09) Alexandre Passos ♦

2 Answers:

Ok, i had some issues getting it right, but I think I might help:

First, aside from the reverse problem is important to notice, that if you have a function:

y=A*x+b

and b~N(0,sigma) (this would be the noise)

then you can make the following thinking:

If b=y-Ax .......................(1)

Then, if b~N(0,sigma)

means p(b)~exp(b^2/sigma^2)............(2)

but if we put 1 in 2

p(y-Ax)~exp((y-Ax)^2/sigma^2)............(3)

But per definition, this expression (3) is also p(y|x;A) if P(y|x)~N(y|Ax)

You can generalize to most distribution.

For the inverse problem now, you need to obtain your true x (input), but to do that you need to use bayes rule, for which you need the likelihood function over all possible set of X inputs, since you have a non invertible mapping in the matrix A.

I recommend you to read the Neural Networks chapter of the Bishop's book on Machine Learning, where he uses the to solve the inverse problem in a rather elegant way.

To grasp the conepts of the equations, Andrew's notes on linear regression in a probabilistic point of view , there you have a good explanation on why p(y|x) is given like that. (Page 12)

And to get on a good example of the inverse program I used these slides

answered Nov 19 '11 at 11:51

Leon%20Palafox's gravatar image

Leon Palafox
31265471107

A frequentist model (WLOG - this case is simplest) generally specifies a structure and an error distribution. So you might see a model specified as

y = A(x) + b

b ~ q (for some distribution q)

But this says nothing about the distribution of y | x, A per se, and you will need this to estimate A via, say, maximum likelihood.

Let Y be the random variable y | x, A and note that conditioning on x and A means that they're just known constants. What you're doing here is just a straight transformation of random variables; you have b ~ q, and want to find the distribution of

Y = k + b

for k a constant. You might remember that if b has a density function, this is just

p(Y) = q(Y-k)|J|

where J is the Jacobian adjustment (for an affine operation like addition, this term is just 1). So,

p(y | x, A) = q(y - A(x))

answered Nov 20 '11 at 01:54

Jared%20Tobin's gravatar image

Jared Tobin
313

that's a great way to look at it; this makes it more clear: http://en.wikipedia.org/wiki/Probability_density_function#Multiple_variables

(Nov 20 '11 at 02:30) Pardis
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.