|
I have an objective function with absolute value function, say, something like: abs(x-x0) where x is the variable. How do we usually deal with the absolute value function? Can we still take the derivatives and use, say, newton method? Thanks a log for any hints. Updates: For those who need more information, the objective function is indeed to use a nonlinear least square fit for a following functional form: A * (abs(x-x0))^(M) * exp(-L*abs(x-x0)) where A, M, L and x0 are the variables to be fitted. Thanks. |
|
If you just have linear constraints with this objective, you should solve this as a Linear Program (LP). Handling the abs objective is a common LP modeling trick. Here's one reference on how to do it. |
|
There are many machine learning methods that computes absolute values in the objective function, be it in the loss (e.g. to implement a regression that is robust to outliers) or in the regularizer (L1 penalty such as the Lasso). In both cases you can take an arbitrary sub-gradient e.g. 0 in your case, where the objective function is not derivable, in your case where x-x0 == 0. Furthermore, you should truncate the gradient when the update of the weights make your weight change sign. This allow for sparsity inducing regularizers to work as expected. Edit: what I am describing here is not a Newton update but a regular stochastic gradient update that handles abs values without any problem if you apply the above tricks. I have no experience with second order methods + abs values in objective functions so I won't comment on the Newton part. |
|
Newton's method will find something close to the optimum with any subgradient (and a nice subgradient of |x| is sign(x)). It won't, as other people said, however, find the exact optimum, and you can get better results with specialized optimizers for l1 regularized functions (since what you describe sounds like l1 regularization). In machine learning there are some traditional methods for optimizing this sort of function, such as the lasso, coordinate ascent, truncated gradient (which might interact badly with newton's method), the method in this paper, the method in this other paper, and many others. This problem also occurs a lot in compressive sensing, where people usually use linear programming to minimize such a regularized/constrained loss. Thanks. @Alexandre, i'll take a look at those papers.
(Aug 04 '10 at 12:41)
Liangjie Hong
|
|
You can also solve a problem like this by showing that a sequence differentiable functions converges to your function in the limit and that the extrema also converges. sqrt(x^2) is not differentiable at zero but sqrt(x^2 + 2^-n) is and you can show what happens to your extrema as n goes to infinity. For slightly more information I asked about this problem here. http://math.stackexchange.com/questions/1649/finding-the-extrema-non-differentialable-functions This looks a lot like logarithmic barriers as used in constrained optimization. For this problem (optimizing a function that includes an absolute value) it seems overkill, and subgradient methods + appropriate truncation (if needed) should do better than many nonlinear programs.
(Aug 05 '10 at 21:01)
Alexandre Passos ♦
It would be overkill if you actually proved its optimal. But in reality you could just try optimizing sqrt(x^2 +0.5) and sqrt(x^2 +0.1) and if they are the same call it a day.
(Aug 06 '10 at 00:34)
Jonathan Fischoff
|
|
Why not just optimize twice? Optimize f(x) and optimize -f(x), and use whichever answer you like better (i.e. whichever has the better abs(f(x)) ). That way you get rid of the absolute value and the resulting non-convexity. A more complicated implementation might try to do both at once, spending more time one whichever side looks more promising at the moment. This works for one variable, but for a multivariate problem you have 2^d possibilities for the signs of all variables, so you gain something by using those other methods, that are linear or n lg n.
(Aug 04 '10 at 19:07)
Alexandre Passos ♦
That's true - but to be clear, it's only an issue if you have multiple invocations of abs() on different things. It doesn't matter how many variables are in the problem if only one expression ever appears inside the absolute value function, and the original question appears to be such a case. You'd probably also need to add constraints requiring it to be >= zero, when (as here) the actual objective isn't abs(something) but f(abs(something)).
(Aug 04 '10 at 19:56)
Matthew Skala
|
|
I explain the whole idea by giving an example. Let's say that we would like to minimize the function |
|
You can write |x - x0| as the square root of a square. I.e. u = x - x0, then |u| = sqrt(u^2). You can then take the derivative of this. If the derivative is difficult to calculate for other reasons you can calculate a derivative numerically, for example by using the secant method. In fact you can numerically calculate the derivative exactly at any point using automatic differentiation even if you cannot analytically calculate the derivative.
This answer is marked "community wiki".
1
writing |u| as sqrt(u^2) does not make it derivable at 0. The derivative remains -1 for u < 0, +1 for 0 > 0 and you can choose 0 as subgradient when u = 0. Automatic Differentiation is an interesting alternative for functions that have a finite number of irregularities in the derivatives.
(Aug 04 '10 at 05:37)
ogrisel
Since they mentioned Newton's method I assumed the original poster just wanted to find a gradient at some setting of x.
(Aug 04 '10 at 05:49)
Noel Welsh
@Noel, yes, we want to find a gradient at some setting of x. I'll look at the "automatic dfferentiation". Thanks.
(Aug 04 '10 at 12:43)
Liangjie Hong
|
Probably, it is good idea to have a look at optimization techniques used to minimize loss functions with L1 regularization term.
Thanks. @bijey, it's not regularization but in the loss function itself.
Just curious, how did you end up with an objective function like that?