Hey all, I also want to put forward some results of Ilya Sutskever that he presented at CIfAR 2010. He basically used James Martins method to train a recurrent neural network. I will not go into details of the experiment, and I expect a paper to appear soon on this subject. What he did was to train a RNN on wikipedia, where the network got as input a one-hot encoding of a character and was suppose to predict the next character. He used 2000 units ( quite a lot for a RNN) and trained it one month on the GPU ( maybe a year in CPU time or more - that is Ilya prediction ). Note that this is not your ordinary experiment, I for one would not have the patience to let it run for 30 days or so. Anyhow the results are amazing. First of all the RNN seems ( I haven't seen conclusive results in this respect, and Ilya just started playing with this) not to suffer from the vanishing gradient problem any more( he also tested this on a task proposed by Schmidhuber et al. in their first paper about LSTM; the task is known to be solvable only by LSTMs and was meant to show that LSTM can deal with connecting events that are far away in time). But the most amazing thing is that he could use the RNN to generate text which was coherent for a few words ( it would also remember to close brackets in some cases and so on). There are yet no qualitative measures of the network behaviour I think ( or Ilya has not presented any for now), so this are very recent results. But we can of course speculate on how well this training method works. After talking to Ilya I also got a few samples generated by the network ( I will post one here, hope he doesn't mind) :
Shortly thereafter it was purchased by the army attacks on Bill World Service, like many of which recognize the Oscar's liberation of the nobility before he desired anything obey, instead of , I to be flooding during the civil war, has been fell.
I'm currently investigating James Martins algorithm, as well as going over Yoshua's paper on vanishing gradient trying to make sense out of all this. My intent is also to implement the algorithm with Theano ( which should make the GPU stuff transparent). Once and if I manage to do that, I will definitely not mind sharing my thoughts and code.