|
What is a "backtracking step" used in Differentiable Sparse Coding? http://www.ri.cmu.edu/pub_files/2008/12/differentiableSparseCoding.pdf The paper that introduced Exponentiated Gradient Descent doesn't make any mention of a backtracking step. My first guess would be that it is to respond to an increase in the objective function by undoing the previous step and repeating it with a different learning rate, but the Differentiable Sparse Coding paper says that the learning rate is adjusted to cause a certain amount of backtracking steps, which implies that it is not adjusted in response to a single failed step. |
|
I won't comment on that paper in particular, but take a look at James Martens' paper Deep Learning Via Hessian-Free Optimization (link to his research page). He talks about using backtracking, and I can only speculate that both papers are talking about the same basic idea. In Martens' paper, he allows conjugate-gradient to terminate then (if necessary) backtracks until he finds a weight update that performs the 'best', starting from the end and working backward. His criteria for 'best' in this case is pretty straight forward: check whether the previous step did better than the current step, and if so take a step back -- stopping when the previous step didn't do better. In this case, the reason to do this is linear conjugate-gradient is used to optimize a quadratic approximation of the model; therefore, the model itself may disagree significantly with the approximation at the optimal solution.
This answer is marked "community wiki".
|