What is a "backtracking step" used in Differentiable Sparse Coding?

http://www.ri.cmu.edu/pub_files/2008/12/differentiableSparseCoding.pdf

The paper that introduced Exponentiated Gradient Descent doesn't make any mention of a backtracking step. My first guess would be that it is to respond to an increase in the objective function by undoing the previous step and repeating it with a different learning rate, but the Differentiable Sparse Coding paper says that the learning rate is adjusted to cause a certain amount of backtracking steps, which implies that it is not adjusted in response to a single failed step.

asked Jun 01 '11 at 10:43

Ian%20Goodfellow's gravatar image

Ian Goodfellow
1072162734


One Answer:

I won't comment on that paper in particular, but take a look at James Martens' paper Deep Learning Via Hessian-Free Optimization (link to his research page). He talks about using backtracking, and I can only speculate that both papers are talking about the same basic idea.

In Martens' paper, he allows conjugate-gradient to terminate then (if necessary) backtracks until he finds a weight update that performs the 'best', starting from the end and working backward. His criteria for 'best' in this case is pretty straight forward: check whether the previous step did better than the current step, and if so take a step back -- stopping when the previous step didn't do better.

In this case, the reason to do this is linear conjugate-gradient is used to optimize a quadratic approximation of the model; therefore, the model itself may disagree significantly with the approximation at the optimal solution.

This answer is marked "community wiki".

answered Jun 01 '11 at 13:10

Brian%20Vandenberg's gravatar image

Brian Vandenberg
824213746

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.