|
Writing unit tests is relatively straightforward for any code, even ML. I think what I want to do is functional testing for ML code, concretely semantic models (lsa, topics, lda, random indexing). Wikipedia says:
That is, to test an implementation, I'd like to see the model doing something it's known to do well. That involves replicating published results. I've seen some tests for example in MDP that use a solved example is known to work. For example, one can do an artificial dataset that produces a known result, and test against this. In their case, they test PCA by generating data with known dimensions, and seeing if the PCA algo recovers them. However for e.g., semantic models it is not easy to find such examples. The whole point about modeling language is that you won't get good results using a toy dataset. One could run tests with large datasets replicating known results, for example solving the toefl test using an encyclopedia. But then the test would involve creating a space, and these things take time (minutes in the most optimistic conditions; even days for LDA).... not something to aim for with a test that must be run every time you change your code... Still there are some published small examples. The Landauer and Dumais paper for LSA has a small example in the appendix. What's your take on testing and ML/NLP? Have you found small, worked out examples for say LDA, topcs, Random indexing, etc? |
|
Usually the best bet is to try to recreate the experiment in the paper that the algorithm was introduced. If you get similar numbers on the dataset they used, you probably implemented it correctly. For algorithms that have already been implemented test both against some standard dataset. They should largely agree in values. But this is what I was afraid to hear. It involves: 1- finding the exact same dataset; sometimes not easy, being proprietary and buried behind forms, CD-shipping and other fossils. Some other times it involves doing a web scrapping... 2- replicate the parsing. Normally papers don't go into details on what was done there. A simple decision such as what to do with punctuation can change the results Any of this things may fail before you even get to replicate the results. Plus, it can be that the replication takes hours of computing time. For a test you need to run often, this is too much.
(Jan 03 '11 at 04:55)
Jose Quesada
|
|
For lots of algorithms you can test the implementation (not the algorithm) on hand-crafted toy data. For testing an LSA implementation, for example, you could sample a low-rank decomposition, multiply to get a matrix, and then disrupt with noise, and see if you recover it reasonably well. For most bayesian models you can test your algorithm against small samples from the prior, and modulo identifiability issues you should always get something reasonable back. |
|
Software testing methods is a much over-looked area in machine learning. If you can replicate the results in a machine learning paper it is a good sign. However, it is often the case that the implementation used in the paper had bugs. Additionally, the experimental setup is often too compressed to give enough information to replicate the experiment exactly. Few machine learning papers include error bars so it is hard to control for factors like differing random seeds in MC algorithms. I would recommend thoroughly testing sub-routines in a classic unit testing way. You should also have a stupid simple, but possibly, slow implementation of the algorithm which you can compare to an efficient implementation, Particular inference methods come with ways of testing them. For instance, in EM and variational methods you can check that the variational lower bound increases every iteration. If not, there is bug. For MCMC methods, John Geweke has a collection of methods to test that they are implemented correctly. Derivatives can be checked with finite differences. |
You argue that functional testing takes a long time, but does this need to be done more than every month or two? As long as you unit test the components to ensure that they have changed, perhaps add some assert statements to ensure data is correctly input to functions, you only need to check that your implementation works once, and is not later altered.