|
I am trying to optimize a nonconvex function of the form f(x)= sum_i g_i(x) − h_i(x) where x is a vector of variables, and g_i and h_i are both convex. While I am aware that such a function can be optimized using a Concave Convex Procedure (CCCP) to a local minima, I am wondering if I can say something about a stochastic gradient method for this function (even guarantees of reaching a local minima will be fine.) Does anyone have any advice or references? |
|
This function is potentially unbounded below (depending on what h_i(x) and g_i(x) are). In this case, you can't prove anything about converging to a local minimum as one may not exist. In practice, something like SGD may give you good performance but I don't think there is much to say generally about convergence or performance without using specific instantiations of h(x) and g(x). |
As far as I know very little is known about nonconvex optimization with stochastic gradient descent, except that with carefully tuned learning rates that go down with time one often can get good results.
You can always do some random jumps or run optimizers in parallel
Something like simulated annealing or deterministic annealing can help to overcome or smooth-out local minima.