|
I'm reading Barak Pearlmutter's paper Fast exact multiplication by the Hessian (Pearlmutter, 1994), and ran across a phrase I'm having trouble finding a definition for. I'll quote the relevant portion below:
I have a few questions on this sentence:
-Brian |
|
I think it just means that the network is not recurrent with units that are connected to themselves or units that are below them in the hierarchy. These constraints are not required for the algorithm to work but seem to aid for the general notation he uses where a single set of weights and a single set of variables describes the whole network. The most relevant part of the paper is how to compute the R() operator derivatives and this is very similar to normal back propagation. Instead of lingering on this for too long I recommend having a look at the relevant part of Bishop's 'neural networks for pattern recognition' for an example using more usual notation. Since you also might want to use the Gauss-Newton approximation instead of the true Hessian, I also recommend to have a look at the paper by Schraudolph (forgot the precise title but the reference is in Marten's paper). I found his notation style to be a bit awkward at first but once I got it, it really made things very clear to me.
This answer is marked "community wiki".
I've read through Schraudolph's paper once, but there were a few areas where he deferred to this paper so I decided to understand it before going back to Schraudolph's. Thanks for the response, I appreciate it.
(Dec 08 '10 at 10:42)
Brian Vandenberg
|
|
Mr. Pearlmutter was kind enough to respond to my questions (I'd also emailed him). I'm quoting his response for the benefit of anyone else who was curious:
So, now I need to digest that. Assuming I'm not reading it wrong, a W in that sort of arrangement would also be singular, because it's lower triangular with zeros on the diagonal. I'm having a bit of trouble visualizing it right now, but it'll click eventually. -Brian The properties of the actual matrix are not really that important in this case. Like he said, the main reason seems to be to define a feedforward network with just two types of variables (weights and unit activations). I personally think it could be easier to read the equations when there are actually different symbols for for example the input, hidden and output units but that is just because I'm more used to that type of notation. The 'pattern recognition and machine learning' book by Bishop also includes the algorithm btw. Computing the R operator is almost the same as normal back propagation but with additional quantities you compute on the way. It might be easiest to start with an implementation of backprop and extend it afterwards. I wouldn't actually implement your network by really using the triangular matrix form.
(Dec 08 '10 at 14:05)
Philemon Brakel
Ya, that form would be unwieldly and expensive to work with, but it does provide a different perspective with which to think about the problem, and that's more what I'm after is understanding the perspective he described. For example, one perspective of neural networks I've seen little treatment for is describing them as a dynamical system, and investigating features of the NN from that perspective. Given a particular model, does it have bifurcations? Are there stable, semi-stable, or unstable manifolds? Can those features be used to improve training, derive better models, create more stability, etc. -Brian
(Dec 08 '10 at 14:30)
Brian Vandenberg
I see your point. There is actually quite some literature on RNN's as dynamical systems in relation to spiking networks and liquid state machines/reservoir computing but you might already know about that.
(Dec 09 '10 at 04:13)
Philemon Brakel
Actually, I didn't. I bought a book on spiking NNs years ago, but at the time my math skills were less than adequate for reading it. -Brian
(Dec 09 '10 at 09:49)
Brian Vandenberg
|
Werbos' paper wasn't published, except in his book, "The Roots of Backpropagation: (...)". Google books has a preview of it if anyone felt like diving into his thesis as I'm about to do:
http://books.google.com/books?id=WdR3OOM2gBwC&printsec=frontcover&dq=werbos+1974+backprop&source=bl&ots=M2qx3ZK1X6&sig=Yw7nEAYP-lly66-Y0cDsbZ1VmOQ&hl=en&ei=Pwr_TOy4NZSusAOpwfivCw&sa=X&oi=book_result&ct=result&resnum=3&ved=0CCgQ6AEwAg#v=onepage&q&f=false