I've read a bit on L1 regularization, but so far in ML I've seen it applied mostly for logistic regression. Is there a rule of thumb to use regularization in other kind of fitness functions.

Let's say we have a general fitness function (or objective function) we want to maximize (or optimize). Can it have Regularization parameters regardless of its nature? Or does it has to be a regression?

Thank You

asked May 31 '11 at 03:09

Leon%20Palafox's gravatar image

Leon Palafox ♦
40857194128

edited May 31 '11 at 04:48


2 Answers:

Theoretically, you should be able to apply it to any classification or regression function that uses a vector of real-valued parameters. If I'm correct it is the same as assuming a Laplace prior over your parameters in the same way that L2 can be seen as a Gaussian prior on your parameters with zero mean and spherical covariance. Applying it to logistic regression (which is actually a classification method) is interesting because you can directly identify redundant features in your data. For a multi-layer perceptron it is less clear what it means to prune out weights in the second layer for example.

Practically, it is easier to apply L1 regularization when the total optimization problem is convex and can be solved using methods like linear or quadratic programming. The most important reason for this is that the L1 penalty does not have a proper gradient so many more general optimizers have to rely on more heuristic approaches.

There is a lot of research being done on the subject so there might be many methods for optimizing it that are completely different from the ones I heard about.

answered May 31 '11 at 03:39

Philemon%20Brakel's gravatar image

Philemon Brakel
2445103560

Actually I'm helping a friend to optimize a function, and he is using Genetic Algorithms, and he asked me if adding an L1 reg parameter would be a good idea, since his data is sparse. I did not have a good answer.

(May 31 '11 at 04:49) Leon Palafox ♦

It should be no problem to add the L1 loss to the fitness function. Whether this will be better than L2 loss depends on the problem itself. See: this post

(May 31 '11 at 05:13) Philemon Brakel

In order to make a good choice for regularization you need to have a reasonably decent idea of what's causing problems.

L1 and L2 create a linear or quadratic (respectively) penalty on the error surface that penalizes on the magnitude of your parameters. Generally speaking, you want to use them to help control the magnitude of your weights. A simple rule of thumb:

  • L1 will penalize large weights in the same proportion as small weights
  • L2 will penalize large weights worse than small weights -- that is, small weights will be perturbed less.
This answer is marked "community wiki".

answered May 31 '11 at 19:07

Brian%20Vandenberg's gravatar image

Brian Vandenberg
824213746

wikified May 31 '11 at 19:13

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.