|
I am wondering what exactly should I throw on a net. I have spectrogram 1000x80 (80 features) and I reshape it into 1x80000 vector; then I put 20 such vectors as my mini-batch. So i have 20x80000 matrix thrown on a DNN, each row labeled due to speaker. Is that proper approach? Second idea is to throw around 100 random frames of various random speaker, each frame labeled due to speaker. I need to extract speaker-dependent features. There were similar experiments, if you know answer or any analogy, please guide me. I hope you know the answer! |
|
Isn't there anyone that knows the answer, or rather there is no answer because of my poor english? (it's not my language...) |