|
I have a few questions relating to the paper Fast Curvature Matrix-Vector Prodcuts for Second-Order Gradient Descent (Schraudolph, 2002), hopefully someone could lend a hand. Around eqn (4.2), page 8 -- the section describing how to compute the product Gv. You can do the f1 pass, stopping after N(w). This is equivalent to saying compute R{.} for the logistic inputs, but don't worry about R{.} for the non-linearity. Isn't the matter of only requiring JN -- in theory only needing R{N(W)} -- more or less an academic point? I suppose in different architectures where there isn't a dependency relationship like this, it becomes a less trivial point ... but for a FF NN, it appears you have no choice but to do the f1 pass through M & N for all but the last layer, where you only need R{N(w)}. To complete the process, after a full f1 pass you can r1-propagate the result back through N(w) -- but again, the dependency problem rears its head. Furthermore, if you mechanically step through it, doing the full r1 pass on AJMJN*V, you end up with JM * JN * A * JM * JN * V = (JM * JN)^2 * V. Any comments? -Brian
This question is marked "community wiki".
|
So, as far as I can tell, I'm right about the dependency issue in the forward pass, but when you do the r1 pass and you're using a matching loss function with your output units, you only need to backprop through N -- see section 4.2 of Schraudolph's paper for details.
As a result, while I do have to f1 through N & M, I can then set u = result from f1 pass, and perform the usual backprop procedure to get the final result.
-Brian
Hey Brian, I think I am having the same confusion, if you have a multilayer network, then you are computing the f1 pass through the nonlinear activation functions at each layer... so then, I actually took N(w) to including all the layers except for non linear functions of the last output layer. But maybe I am thinking of this wrong ?
Hey, please don't close it without an explanation, as other people might actually have the same conclusion. I'm reopening, could you please post your explanation as an answer?