0
3

Hi all,

Which of the two methods is the most effective to avoid overfitting: regularization or searching the model's hyperparameters through crossvalidation? is one of these methods preferable for large/small sets of data? is there any experimental evidence concerning this question?

Thanks.

asked Feb 10 '12 at 01:59

Lucian%20Sasu's gravatar image

Lucian Sasu
513172634


2 Answers:

The 2 methods are generally used together since they have different purposes:

  • cross-validation makes it possible to measure how a parameterized learning algorithm is able to generalize to data unseen at training time

  • regularization is a parameter for a learning algorithm that trades off two types of potential generalization error: bias vs variance. Highly biased models cannot fit the training data as well (fewer degrees of freedom) but on the other hand are less likely to over-fit the training data by learning simpler models hence potentially causing less variance.

It is possible to combine the 2 together by doing a cross validated grid search for the optimal value of the regularization parameter (e.g. C in SVM).

Edit: arguably if the algorithm is able to scale to large datasets (linear training time) and that this data is available cheaply (generally not true for supervised learning) then regularization is less important as the redundancy in the training set will generally act as a natural regularizer that will prevent over-fitting. It is still interesting to do cross validation (maybe online cross validation to make it scalable) so as to measure the remaining amount of overfitting.

answered Feb 10 '12 at 02:28

ogrisel's gravatar image

ogrisel
498995591

edited Feb 10 '12 at 06:17

Thanks. "online cross validation"? what is this?

(Feb 10 '12 at 06:17) Lucian Sasu
1

If you have some much data that you know that your online algorithm will be fitted in one single pass then you can buffer the new data in minibatches and use it twice, first for testing and then for training:

  • you can estimate the test error by predicting the outcome of the current state of the model and comparing it to the expected labels for that chunk,

  • then update the model with the same chunk of data.

To limit the stochasticity of the test error estimate your can smooth it with exponentially weighted averaging scheme. AFAIK the SGD model of Mahout is doing so along with maintaining several online model in parallel (+ some kind of evolutionary algorithm to blend the bests from time to time).

(Feb 10 '12 at 06:23) ogrisel

There is no rule against using both of them together.

Cross validation is usually a good method to find the best value for your regularization parameter.

You can find the optimum set of parameters using your training cost, then use the cross validation cost and test the parameters. (The training cost includes the regularization, while the CV does not).

Then you can test different values of the regularization parameters and use trial and error to find the best ones.

answered Feb 10 '12 at 02:28

Leon%20Palafox's gravatar image

Leon Palafox ♦
40857194128

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.