Revision history[back]
click to hide/show revision 1
Revision n. 1

Aug 19 '10 at 11:24

pascanur's gravatar image

pascanur
61123

Hey all, I also want to put forward some results of Ilya Sutskever that he presented at CIfAR 2010. He basically used James Martins method to train a recurrent neural network. I will not go into details of the experiment, and I expect a paper to appear soon on this subject. What he did was to train a RNN on wikipedia, where the network got as input a one-hot encoding of a character and was suppose to predict the next character. He used 2000 units ( quite a lot for a RNN) and trained it one month on the GPU ( maybe a year in CPU time or more - that is Ilya prediction ). Note that this is not your ordinary experiment, I for one would not have the patience to let it run for 30 days or so. Anyhow the results are amazing. First of all the RNN seems ( I haven't seen conclusive results in this respect, and Ilya just started playing with this) not to suffer from the vanishing gradient problem any more( he also tested this on a task proposed by Schmidhuber et al. in their first paper about LSTM; the task is known to be solvable only by LSTMs and was meant to show that LSTM can deal with connecting events that are far away in time). But the most amazing thing is that he could use the RNN to generate text which was coherent for a few words ( it would also remember to close brackets in some cases and so on). There are yet no qualitative measures of the network behaviour I think ( or Ilya has not presented any for now), so this are very recent results. But we can of course speculate on how well this training method works. After talking to Ilya I also got a few samples generated by the network ( I will post one here, hope he doesn't mind) :

Shortly thereafter it was purchased by the army attacks on Bill World Service, like many of which recognize the Oscar's liberation of the nobility before he desired anything obey, instead of , I to be flooding during the civil war, has been fell.

I'm currently investigating James Martins algorithm, as well as going over Yoshua's paper on vanishing gradient trying to make sense out of all this. My intent is also to implement the algorithm with Theano ( which should make the GPU stuff transparent). Once and if I manage to do that, I will definitely not mind sharing my thoughts and code.

click to hide/show revision 2
spelling

Jan 17 '11 at 13:55

David%20Warde%20Farley's gravatar image

David Warde Farley
551820

Hey all, I also want to put forward some results of Ilya Sutskever that he presented at CIfAR 2010. He basically used James Martins Martens' method to train a recurrent neural network. I will not go into details of the experiment, and I expect a paper to appear soon on this subject. What he did was to train a RNN on wikipedia, where the network got as input a one-hot encoding of a character and was suppose to predict the next character. He used 2000 units ( quite a lot for a RNN) and trained it one month on the GPU ( maybe a year in CPU time or more - that is Ilya prediction ). Note that this is not your ordinary experiment, I for one would not have the patience to let it run for 30 days or so. Anyhow the results are amazing. First of all the RNN seems ( I haven't seen conclusive results in this respect, and Ilya just started playing with this) not to suffer from the vanishing gradient problem any more( he also tested this on a task proposed by Schmidhuber et al. in their first paper about LSTM; the task is known to be solvable only by LSTMs and was meant to show that LSTM can deal with connecting events that are far away in time). But the most amazing thing is that he could use the RNN to generate text which was coherent for a few words ( it would also remember to close brackets in some cases and so on). There are yet no qualitative measures of the network behaviour I think ( or Ilya has not presented any for now), so this are very recent results. But we can of course speculate on how well this training method works. After talking to Ilya I also got a few samples generated by the network ( I will post one here, hope he doesn't mind) :

Shortly thereafter it was purchased by the army attacks on Bill World Service, like many of which recognize the Oscar's liberation of the nobility before he desired anything obey, instead of , I to be flooding during the civil war, has been fell.

I'm currently investigating James Martins Martens' algorithm, as well as going over Yoshua's paper on vanishing gradient trying to make sense out of all this. My intent is also to implement the algorithm with Theano ( which (which should make the GPU stuff transparent). Once and if I manage to do that, I will definitely not mind sharing my thoughts and code.

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.