|
I'm implementing Lee's convolutional DBN as a feature extraction method for bird song classification, and encountered some problems. As my understanding, they treat different frequency channels (with PCA though) in the spectrum independently to get a two-layer deep belief network. I'm confused about some details of their implementation:
Thanks for your help! Update:Thanks @Sharath Chandra. I've checked their source code, it turns out they've combine the hidden layers together during training iteration. |
Thanks a lot! Could you please tell me if the following analysis goes wrong: assume the 1-D input has length 17, as their description, the layers has filter of length 6, max pooling ratio 3 and there're 300 hidden bases, then the first layer output has length (17 - 6 + 1)/3 x 300 = 1200, thus the second layer output has length (1200 - 5)/3 x 300, which is too large.
(Mar 22 '14 at 17:46)
Jingwei Zhang
Yes the analysis sounds right to me. Its the size of input for the third layer. You can run it in mini-batches to be able to fit your RAM. You have a look at Honglak's code on his website (link also in other threads in this site).
(Mar 23 '14 at 05:43)
Sharath Chandra
|