|
Is the there any published result showing how to do phone recognition (or possibly any other speech recognition task) using Stacked Autoencoder instead of DBN (as done by Mohamed et al., 2009)? As discussed in a previews thread, pre-training a Deep Neural Network with Autoencoders instead of RBMs should be mainly a matter of personal preference and thus is expected to yield equivalent performance. I'd like to use the Pylearn2 Stacked Autoencoers GPU implementations (or maybe the deeplearning.net implemention) for phone recognition, but since I haven't found anyone who claim to have done it before, I'm afraid there must be something which makes this task harder than I'm naivly expecting. |
|
It don't see why it would be hard at all. For large speech databases, pre-training only helps a tiny bit anyway. |