|
I've been reading up on regularization and wanted to verify that I haven't misunderstood. Any problems with this summary?
Does all that sound ok so far? Missing anything big, or misconstrue some concepts? If that all checks out, I'm curious what are some practical tips for L1/L2 regularization. What are some practical values? Which direction is "strong regularization" - e.g. is 0.001 or 10 more regularization? What are some tell-tale signs that you are overfitting your data (and need to add regularization)? When your training converges...but the performance metric on your testing set is poor? |
|
From a Bayesian perspective, regularization can be interpreted as incorporating prior knowledge that one has into the system. Example, L1- regularization in logistic regression prunes the unwanted data. Check Andrew Ng paper "Feature selection, L1 vs. L2 regularization, and rotational invariance" Coming to your question about the regularization, what value to take itself depends on how you pose the problem. In case of |