|
I was reading Ch. 3, "Inversion within the Probabilistic Framework", of the book "Bayesian Approach to Inverse Problems". I was wondering why the the direct conditional density is given as a function of the noise. That is, p(y|x) = q(y - A(x)), where the inverse problem we are aiming to solve is, y = A(x) + b, with y being the observed values, A the operator describing the system, and b the additive noise. More importantly, I don't understand intuitively why the solution to the inverse problem is computed by employing the direct distribution (which becomes the likelihood when y is known, which is the case in an inverse problem). Using maximum likelihood estimation, the solution is computed as follows, x* = argmax_x p(y|x). |
|
Ok, i had some issues getting it right, but I think I might help: First, aside from the reverse problem is important to notice, that if you have a function: y=A*x+b and b~N(0,sigma) (this would be the noise) then you can make the following thinking: If b=y-Ax .......................(1) Then, if b~N(0,sigma) means p(b)~exp(b^2/sigma^2)............(2) but if we put 1 in 2 p(y-Ax)~exp((y-Ax)^2/sigma^2)............(3) But per definition, this expression (3) is also p(y|x;A) if P(y|x)~N(y|Ax) You can generalize to most distribution. For the inverse problem now, you need to obtain your true x (input), but to do that you need to use bayes rule, for which you need the likelihood function over all possible set of X inputs, since you have a non invertible mapping in the matrix A. I recommend you to read the Neural Networks chapter of the Bishop's book on Machine Learning, where he uses the to solve the inverse problem in a rather elegant way. To grasp the conepts of the equations, Andrew's notes on linear regression in a probabilistic point of view , there you have a good explanation on why p(y|x) is given like that. (Page 12) And to get on a good example of the inverse program I used these slides |
|
A frequentist model (WLOG - this case is simplest) generally specifies a structure and an error distribution. So you might see a model specified as
But this says nothing about the distribution of y | x, A per se, and you will need this to estimate A via, say, maximum likelihood. Let Y be the random variable y | x, A and note that conditioning on x and A means that they're just known constants. What you're doing here is just a straight transformation of random variables; you have b ~ q, and want to find the distribution of Y = k + b for k a constant. You might remember that if b has a density function, this is just p(Y) = q(Y-k)|J| where J is the Jacobian adjustment (for an affine operation like addition, this term is just 1). So, p(y | x, A) = q(y - A(x)) that's a great way to look at it; this makes it more clear: http://en.wikipedia.org/wiki/Probability_density_function#Multiple_variables
(Nov 20 '11 at 02:30)
Pardis
|
This looks like P(b|x,y), not P(y|x). I'm not familiar with this book, so I can't help more.