|
Hi all, I was wondering if anyone had tips on testing stochastic algorithms (eg: MCMC, Particle Filters, etc). I have an algorithm relying on both particle filters and MCMC, and to guarantee convergence it is critical that each part be correct. Is there any methodology I can use for guaranteeing correctness other than checking correctness in expectation (which always has a small chance of failure)? |
|
You can (in every language I know about) seed the random number generator, so that it returns the same value every time. You can use this to repeat tests you know works, which removes the "small chance of failure". If your code is modular, you may be able to feed given inputs and calculate the expected output. Using a neural network as an example, set up a neural network with specific weights and a specific input, then test that the result of a single iteration produces the updates to the weights that are expected. This is usually a subroutine in a much larger training method, but this part can be tested itself. Finally, there is also value to just seeing that it runs, and adding some assertions (i.e. that the error rate is equal or decreasing after every iteration) to your code. 1
Thanks Robert! Those are all ways I've used in one form or another, and I'll detail my experiences with each below. Setting the seed of your random number generator is indeed a way to ensure reproducible behavior, but I'd like to avoid it as it also restricts the implementation and order of execution. If you decide to change how your algorithm works, generating fewer or more random numbers, your tests will all need to be rewritten. My code is indeed modular (for example, each MCMC move selects a "move type" and then samples the next state using the corresponding proposal distribution), and I do test for that. This is a great start, but integration testing (eg, the full MCMC sampler) becomes very expensive due to all the random components. I actually had an infuriating bug that only showed up when everything was combined recently. Finally, seeing that it runs is, unfortunately, almost the best way I know of as well. That's fine for the final result, but it's not very decomposable :(
(Sep 10 '11 at 21:01)
Daniel Duckwoth
No problem. Reading your response, my answer was a bit basic - you never really know how skilled some people are on question forums!
(Sep 11 '11 at 00:06)
Robert Layton
Mind however, that, when training a neural network, your learning step has to be within 90 degrees of the 'best' search direction to find a minimum. Thus, convergence is not a sufficient criterion for an optimizer to work.
(Sep 12 '11 at 02:38)
Justin Bayer
|
|
The best way I've found is to generate dummy data and make sure the MCMC infers the parameters of your dummy data. To give you a very basic example, if you're testing your LDA implementation, you can generate data according to the LDA generative story. Come up with a limited vocabulary and assign each word a fairly basic topic distribution (e.g., 80% one topic, 2% all other topics). Come up with document topic distributions (e.g., 80% one topic, 20% another). Then generate documents using those probabilities. Your MCMC should learn the correct document-topic and word-topic distributions assuming the dummy data you came up with is relatively simple. I've found that this method will ferret out most small mathematical errors, which can be pretty common when implementing an inference algorithm in a language that isn't particularly mathematically suited, like C or Java. Could you elaborate a little more? MCMC would give you a distribution over parameters for a fixed observation set, so how would sampling 1 set of observations from your "generative story" let you know what the posterior over parameters is given these observations?
(Sep 10 '11 at 23:14)
Daniel Duckwoth
@Daniel: you want your posterior, as computed from MCMC, to assign a relatively high density to the true parametrization. This is not completely ideal, however, as with too little data it's perfectly OK for MCMC's posterior to be all over the wrong place, and this also happens if you do as he suggested and pick weird-looking values for some parameters (as in, if the values you picked don't look like the prior then it's possible MCMC will get lost).
(Sep 11 '11 at 09:06)
Alexandre Passos ♦
Normally you make a model and use both the priors and observations to infer the parameters. Basically what I'm saying is that you (on paper) determine what the parameters should be and generate data to fit them. Now you have observations and you know what parameters you should infer. Alexandre is right that this can sometimes go awry. On more than one occasion I generated data and the MCMC learned a different but valid set of parameters given the data. So you have to be careful when generating your dummy data. The Griffiths and Steyvers (2004) paper that Leon refers to has a good example.
(Sep 11 '11 at 11:46)
Kirk Roberts
Ah, I see! You're right -- as you increase the amount of data, your posterior should lean more and more towards the true parameters. That makes perfect sense.
(Sep 12 '11 at 02:16)
Daniel Duckwoth
|
|
I usually do 2 things: First, I usually know the density function, which means you can basically test it on the points you are sampling. You can sample the points, test it and then, if its dimension <=3, you just can draw it, and see how it looks, if not, it's a bit tricky, you can do traces over different dimensions and see its projections on them. The other one is much in line with what Kirk said, for example, in the dummy example of LDA by Griffiths, they generate a set of images by the random combination of 7 base images. And then they inferred the original base images using LDA, if you did things right, you should be able to obtain those original images. And you can do this with any other distribution (most times at leas) Thanks @Leon! In line with your first comment -- plotting the density vs samples. I've been automating this for discrete distributions by taking a boatload of samples and checking the L-infinity norm difference between the empirical and predicted distribution. Is there a simple way to extend this to continuous distributions? I could calculate expectations of some function perhaps? But then what would be a good choice?
(Sep 14 '11 at 20:13)
Daniel Duckwoth
|
|
According to Hanna Wallach, it's a very good idea to first write a horribly slow version of the code that is a literal translation of the math and hence should work. Then you keep that around, fix the random seed, and write faster code while making sure that it acts exactly like the other code (literally making the same jumps to the same states with the same computed complete-data likelihood). This should let you be pretty sure that your code is acting like it should act. You can even write this down as unit tests. Of course! Write it correct, then write it fast at the big-O level, then micro-optimize if necessary. Unfortunately some code (eg, MCMC Data Association by Oh et al) just has a lot of components, and even the math isn't particularly clean.
(Sep 14 '11 at 20:33)
Daniel Duckwoth
|