|
Are these two approaches to ell_2 regularized logistic regression in scikit-learn equivalent? From my understanding, scikit-learn has a regularizaed logistic regression module based on LibLinear (called LogisticRegression) and a second one based on stochastic gradient descent (called SGDClassifier). The LibLinear-based solver solves the following problem: $$ min_{w} C sum_{i} log{(1+ e^{y^i X_i^T w)} + frac{1}{2} w^Tw$$ while the SGD solver minimizes $$ min_{w} sum_{i} log{(1+ e^{y^i X_i^T w)} + frac{rho}{2} w^Tw$$, where we are assuming that the labels $y_i in {-1, +1}$. If the above is correct regarding which problems each technique is solving, then they should be equivalent if $C gets frac{1}{alpha}$. However, running both of these solvers on even very small, low-dimensional classification problems, I don't get the same weights $hat{w}$. Nor does it appear that the objective functions are minimized. I wrote a Python link:script that compares the solutions returned by the LibLinear approach, the SGD approach, and my own implementation in Cvxpy. Here are the results for the three techniques:
Now, I understand that for gradient methods, they can get within a neighborhood of the global minimum fairly quickly but require many iterations to converge to the global minimum. I also understand that there may be additional parameter settings for SGD that lead to smaller losses. Still, I am a bit surprised that neither the LibLinear approach nor the SGD approach work well out of the box on what is essentially a tiny problem (there are only 2 parameters to learn and 100 training points!) So I suspect either a programming error on my part, or something else important that I am missing. |
|
My favorite... Yes, the problems are equivalent, but as always, the devil is in the details. LogisticRegression works with dual=True by default, so indeed, it does not optimize the same objective, it optimizes the dual (the optima are the same, but the path there differs). Also there is a 'tol' parameter that controls the precision. Try to turn that down. The other issue might be scaling of the penalty. There is the issue of multiplying / dividing C by n_samples, which often serves as a source of great pleasure... not. I have an implementation of a structured SVM in my pystruct package. It has two dual solvers based on cvxopt and a stochastic gradient descent solver and could reproduce the behavior of LibLinear with both (for the hinge loss, but still) for binary and multi-class svm problems (crammer-singer svm). In the stochastic subgradient version, I needed to multipy C by the number of samples to get the behavior of LibLinear back. For the SGDClassifier, convergence depends a lot on alpha and the eta0, but with enough iterations, I think you should not see much difference from LibLinear on such a toy problem. Hth, Andy Oh, btw, liblinear is totally non-deterministic. Try varying the seed and have fun interpreting the curve... I did that for quite some time before finding my mistake.
(Feb 16 '13 at 14:54)
Andreas Mueller
Hi Andy, SGDClassifier actually seems to be outperforming LibLinear with the settings I used...in the post above, it was CVXPY < SGD < LibLinear. Also, I'm setting the tolerance for LogisticRegression() (aka the LibLinear wrapper) to 1e-16. If tolerance roughly maps to a duality gap, then after setting the tolerance that low I should be closer to the value returns by CVX.
(Feb 16 '13 at 18:17)
Pads Niels
Yeah, tol should be duality gap. So still it might be that the scaling of C is different. What is the advantage of cvxpy vs cvxopt btw? Are you solving the primal?
(Feb 16 '13 at 18:58)
Andreas Mueller
Basically, the only reason I implemented logistic regression in cvxpy was because I didn't understand why LibLinear and SGD were giving such vastly different results, and was hoping that there would be agreement between cvxpy and one of the Scikit techniques. Alas... I only chose to implement in cvxpy rather than cvxopt because I found it much faster to write the code. Cvxpy actually uses cvxopt under the hood.
(Feb 16 '13 at 19:06)
Pads Niels
|