|
I'm looking at this paper and I'm surprised that, in their analysis, the convergence rate depends on the optimal value of the error function (e.g. Eqs (5) and (6) have However, if we switch from the error function |
|
L(w*) is the expected loss at the "optimal" (or in some sense "best") solution w*. If L(w) is some expected loss one arrives after n iterations, then the bound ||L(w) - L(w*)|| helps to understand how close one is to the optimal solution. Their analysis gives a indication of how their proposed model has a tighrer upper bound indicating faster convergence to the solution after some n iterations. The bound is not between two different loss function as you indicated, but rather the same loss function computed at two different points in the solution space of w. Sorry, but I'm afraid you didn't understand the question. Take L and consider a new loss L + 2013. The left-hand side (||L(w) - L(w*)||) doesn't change, but the right-hand does, even though SGD should be unaffected. Do you see the problem now?
(Oct 28 '13 at 00:50)
Max
Yes, I got your point now! Sorry about that. Oddly enough to understand, L(w*) seems to be important for convergence. But I think it is important to note that L(w) is a loss function here and not any convex function. Taking a constant (e.g L(w) + 2013) is tantamount to saying so much loss is bound to happen, for example so much misclassification is just given. Here is a point they make, taking it right out of the same paper (just below Eq. (3)): "This self-bounding property tells us that the norm of the gradient is small at a point if the loss is itself small at that point.".
(Oct 28 '13 at 09:30)
Rakesh Chalasani
|
Can you be more specific? Just glancing through the paper, the AG simply appears to be using a momentum term while updating, making the convergence faster. Roughly, their analysis shows that the bound on the error between the weights after nth iteration and optimal is better with AG than SGD, indicating some faster convergence with their method.
@RakeshChalasani edited