|
I am trying to implement the "Feature-Weighted Linear Stacking" technique described by the second price winner of the Netflix Grand Challenge in this paper. The solution provided there relies on solving a system (A^T * A + lamda * I) * v = A^T * y (too bad bad MO has no support for mathematical notation) where
My main problem lies in choosing a regularization parameter, because I have no idea, what it should accomplish. Has anyone ideas on this, or even better, a sourcecode with a concrete implementation? Edit: The implementation of the regularization step itself isn't very well described in the paper. I would be very thankful for all additional information on Tikhonov regularization from a CS background, rather than a pure mathematical. |
|
The regularization parameter controls how much fitting is the model allowed to do. It usually has no intuitive interpretation that can lead you to picking a good value for it, so it is usually chosen by grid search with cross-validation on a validation set. In this case I seems to be the identity matrix and lambda a real-valued parameter that acts as the precision of a normal variable. You can find more computational information on Tikhonov regularization on wikipedia. In statistics it's known as ridge regression, and you'll find a very good implementation of it at scikits.learn. To see one way of solving it, consider that you want to minimize ||Ax - b||^2 + lambda ||x||^2. Taking the gradient of this expression w.r.t x you can see it's 2 lambda x + 2 A^T Ax - 2 b^T A. Setting it to zero (since this a quadratic it has a unique minimum) you will find that 2 lambda x + 2 A^T Ax - 2 b^T A = 0 implies that (lambda I + A^T A) x = b^T A, or that x = (lambda I + A^T A)^-1 b^T A. Hence with any library that inverts matrices you can compute the ridge regression. Of course, as inverting a matrix is expensive there are faster approximate methods, and as A can be ill-conditioned you might want to use a pseudo-inverse or solve the linear system specified above directly. |
|
The simplest way to determine lambda is, as Alexandre suggests, to basically try a bunch of values and see which one works the best for your problem. For Tikhonov regularization there is a trick which lets you do leave-one-out cross-validation over a whole bunch of possible lambda values really quickly. How you can do this is discussed in this paper: Notes on Regularized Least Squares by Ryan M. Rifkin and Ross A. Lippert. Rifkin's paper talks about a bunch of different ways of accomplishing this trick. If you want to see C++ code which implements one way then take a look here. It tries 50 lambda values ranging from 10^-9 to 10^2 and picks the one which gives the best cross-validation result. |
|
The GCV and L curve methods are the two most popular automatic methods of finding the regularization parameter. Check out works by Dr. Hansen, I'd highly recommend his SIAM book on the subject and his MALTAB regularization toolbox. http://www2.imm.dtu.dk/~pch/ |