1
1

Hello. I am working currently on a problem where I have to solve either an L2-regularized logistic regression or L2-reg linear SVM problem, where I have an added affine term.

So my problem for example is: min_ w {C*sum_i log(1+exp(w * x_i * y_i)) + ||w||^2_2 + w * v }

where v is a constant vector.

Of course this is a convex problem and can be solved with the usual methods, but I have to solve very many large problems of this type, so I would very much like to use a standard library such as liblinear.

My question is, is there a way to transform the data x, the labels y, or the weighing factor C (perhaps into a different C_i for each instance), such that this problem will be equivalent to a standard logistic regression or hinge-loss SVM problem?

asked Feb 08 '12 at 04:48

ualex's gravatar image

ualex
1614611

edited Feb 08 '12 at 05:13

1

Plugging an efficient way of computing this function into a fast lbfgs implementation shouldn't be that much slower than liblinear. Have you tried this?

(Feb 08 '12 at 09:56) Alexandre Passos ♦

No, I haven't tried. I was under the impression it wouldn't be as fast, given how optimized the solvers for large linear classification problems are now. I will try out now following your suggestion, and update on the results.

(Feb 08 '12 at 14:48) ualex

2 Answers:

I can't think of a way to turn it into something which can be processed by something like liblinear. However, you could easily solve the SVM version of this optimization problem with one of the general purpose cutting plane optimization libraries. All you have to do is write code to compute an element of the subgradient (which is just w + v - Csum_i x_iy_i in your case) and the value of the objective. Then a cutting plane routine can find the optimal w.

There is a CPA optimizer in Shogun and also one in dlib. I haven't used Shogun's version but I have used the one in dlib for a lot of problems (I'm also the author of dlib).

Someone also mentioned using L-BFGS which should work fine for the log loss. However, a cutting plane method should work a lot better for the hinge loss. For some more background, the following paper is excellent reading:

  • Bundle Methods for Regularized Risk Minimization by Choon Hui Teo, S.V.N. Vishwanthan, Alex J. Smola, Quoc V. Le; 11(Jan):311-365, 2010.

answered Feb 08 '12 at 17:49

Davis%20King's gravatar image

Davis King
19626

edited Feb 08 '12 at 18:02

SVM perf uses a cutting plane optimization method; see the SGD link I provided for a benchmark comparison. For large scale problems these days, I don't know why anyone still uses batch algorithms.

(Feb 08 '12 at 22:02) Travis Wolfe

Thanks a lot (for your answer double answer...) ! I wasn't very familiar with general purpose cutting plane algorithms, and I will definitely look into the libraries you mention.

(Feb 09 '12 at 05:05) ualex

SGD is great for some problems. But I find that when I use it I often spend a lot of time running it multiple times and playing with parameters or wondering to myself "maybe it didn't converge, I'll run it again and check...". On the other hand, the state-of-the-art batch methods leave no doubt about convergence and have no parameters and so often require me to spend less of my own time on it even if it takes more computer time. In this sense, I think the batch methods are more user friendly.

(Feb 09 '12 at 08:12) Davis King

This is usually called elastic net regularization and you can implement it fairly easily and efficiently with stochastic gradient descent (assuming you remember some calculus!).

answered Feb 08 '12 at 14:56

Travis%20Wolfe's gravatar image

Travis Wolfe
235119

Elastic net adds an l1-term: sum_i vi xi, not a linear term, sum_i vi xi. The absolute value function matters a lot.

(Feb 08 '12 at 15:02) Alexandre Passos ♦

If I'm not mistaken, in elastic net you add an L1 term to the penalty.

Here I'm doing something that in essence is simpler - just adding an affine term, without any other constraints. This makes it very amenable to SGD, but as I mentioned - I wish to use a fast existing library because I'm going to solve this problem tens of thousands of times for hundreds of thousands of instances each time.

(Feb 08 '12 at 15:03) ualex

sorry, i just assumed that you wanted an L1 penalty because it is far more common. if you are writing code in a fast language and you are willing to tune a parameter, then i would just implement myself. i think you may have a hard time finding a library that does what you want because it is uncommon.

(Feb 08 '12 at 16:30) Travis Wolfe

My thought/hope was that perhaps a simple transformation (change of variables) will do the trick.

For example, if we define w' = w+v/2, then ||w'||^2_2 = ||w||_2^2+w*v+const. But now you have to adjust the loss term somehow... In the SVM / hinge-loss case I managed to solve it assuming that 1+v*x_i*y_i > 0 for all i - an assumption that of course doesn't hold in general.

(Feb 08 '12 at 16:46) ualex
1

One reason why this is not particularly useful advice is that liblinear deals with elastic net regularization.

I'm skeptical about finding a nice closed-form solution for the hinge loss and log-loss cases precisely because these losses are nonlinear so a linear reparametrization can change them in weird ways.

(Feb 08 '12 at 17:25) Alexandre Passos ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.