|
Hi I was reading the Boyd and Vanderberghi book on how one may implement line search. I wanted to use this method for a simple gradient descent algorithm(Section 8.7). For example for a Linear SVM I found this implementation, but could not quite understand it. Here is the regression type function the function is supposed to operate on: F = lambda * ||w||^2 + Sum_ {i} ((1- y_ i(w.x_i))^2) ; Grad = lambda * w + Sum_i(1-y _i(w.x _i))y _i*x _i ; H = lambda * I + Sum_ i ( x_ i * x_i') ; The function signature: function [t,out] = line_search_linear(w,d,out,Y,lambda) d is the newton search direction (H^-1*Grad for newton method) out is a vector of One's initially. Its supposed to be the Huber/hinge loss after this function exits. out = (1 - y_ i(w.x_i))^2. Here is my intuition with questions about what I dont understand: The body of the function: The notion seems to be restricting the its 1st and 2nd derivatives along a line. Meaning projecting everything on a line(looks like a simple dot product). Line 7 is projecting all the points and line. For the main loop I am not able to connect the mathematics in the book and the implementations.. Could some one explain this lucidly? I wanted to understand this and then do the same for a simpler Gradient descent algorithm. Thanks -Sid |
|
As B&V states, an exact line search is not a good idea in gradient descent algorithms, as they're almost always following a bad direction, usually it is better to finish the line search after finding any sufficiently big reduction in the objective. For the Newton method, however, and exact line search is better, as a Newton step is expensive and generally moves in a good direction. Even so you have two options: do an exact line search using the function f as the objective and doing an exact line search using the Newton approximation as an objective (and this latter can usually be found analytically). B&V is not a very good reference for unconstrained optimization methods. See Nonlinear Optimization by Bertsekas, but even Numerical Recipes can give you a clearer picture of what must be done with an exact and an inexact line search, and when to use either of them. I'm not good with matlab nor understanding random line search algorithms, so I'll let someone else intervene. For linear SVMs, however, you're probably better off either using a specialized optimizer or doing stochastic gradient descent (as in Pegasos) or an exponentiated gradient algorithm (I have no reference for the linear two-class SVM, but Bartlett, Collins, and Taskas Exponentiated gradient algorithms for large-margin structured classification and some other papers by the same authors are quite adaptable to this problem). |
|
Hello: Gradient Descent and Newton method are two ,albeit not entirely different, distinct methods. Gradient Descent is where you are basically trying to move on the direction of your control variable in order to move to the local minimum. Newthon method: it tries to approximate a function f via a linear function that is tangent to f at a current guess. I recommend you the first set of notes of the Machine Learning Lecture by Andrew NG, he explains both methods without introducing SVM's at all. |