|
What would happen if you replaced the dual update step in ADMM with exponentiated gradient, or another low-regret update. I've read several papers out of TTIC outlining low-regret updates and connections to convergence and learning bounds. The algorithms these papers show often have a primal/dual aspect, which of course defines ADMM and similar Augmented Lagrangian methods. Can the two be connected, in particular in the way shown in the topic title? |
Or, more interestingly, are there any problems for which ADMM local updates are exponentiated-gradient updates?
There's a paper by about exponentiated gradient for CRFS that factored into this question too: http://people.csail.mit.edu/gamir/pubs/olcrf.pdf
Indeed. This and the followup papers are things I almost, but not quite, understand. Can you give a concise explanation of how they arrive at the exponentiated gradient formulation? I ask because with CRFs you usually work in logspace (as the normalization comes later and is expensive), unlike the usual winnow-like online learning, where you work with an explicit probability vector, and from where the exponentiated gradient can be derived by restricting things to the simplex and maximizing movement in the direction of the gradient while minimizing KL divergence between old and new states.
If I recall (though this is beyond what I understand technically), the family of Augmented Lagrangian methods can be extended to any type of bregman divergence, or which KL is one. This could be a connection to the entropy regularization interpretation of exponentiated gradient. Whether this gives any sort of regret bounds or more than a superficial connection though, I'm unsure.