I have a few questions relating to the paper Fast Curvature Matrix-Vector Prodcuts for Second-Order Gradient Descent (Schraudolph, 2002), hopefully someone could lend a hand.

Around eqn (4.2), page 8 -- the section describing how to compute the product Gv. You can do the f1 pass, stopping after N(w). This is equivalent to saying compute R{.} for the logistic inputs, but don't worry about R{.} for the non-linearity.

Isn't the matter of only requiring JN -- in theory only needing R{N(W)} -- more or less an academic point? I suppose in different architectures where there isn't a dependency relationship like this, it becomes a less trivial point ... but for a FF NN, it appears you have no choice but to do the f1 pass through M & N for all but the last layer, where you only need R{N(w)}.

To complete the process, after a full f1 pass you can r1-propagate the result back through N(w) -- but again, the dependency problem rears its head. Furthermore, if you mechanically step through it, doing the full r1 pass on AJMJN*V, you end up with JM * JN * A * JM * JN * V = (JM * JN)^2 * V.

Any comments?

-Brian

This question is marked "community wiki".

asked Feb 02 '11 at 02:05

Brian%20Vandenberg's gravatar image

Brian Vandenberg
824213746

closed Feb 02 '11 at 03:27

So, as far as I can tell, I'm right about the dependency issue in the forward pass, but when you do the r1 pass and you're using a matching loss function with your output units, you only need to backprop through N -- see section 4.2 of Schraudolph's paper for details.

As a result, while I do have to f1 through N & M, I can then set u = result from f1 pass, and perform the usual backprop procedure to get the final result.

-Brian

(Feb 02 '11 at 03:49) Brian Vandenberg

Hey Brian, I think I am having the same confusion, if you have a multilayer network, then you are computing the f1 pass through the nonlinear activation functions at each layer... so then, I actually took N(w) to including all the layers except for non linear functions of the last output layer. But maybe I am thinking of this wrong ?

(Jan 04 '13 at 14:06) will henry

Hey, please don't close it without an explanation, as other people might actually have the same conclusion. I'm reopening, could you please post your explanation as an answer?

(Jan 04 '13 at 16:43) Alexandre Passos ♦
Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.