I am wondering what exactly should I throw on a net. I have spectrogram 1000x80 (80 features) and I reshape it into 1x80000 vector; then I put 20 such vectors as my mini-batch. So i have 20x80000 matrix thrown on a DNN, each row labeled due to speaker.

Is that proper approach?

Second idea is to throw around 100 random frames of various random speaker, each frame labeled due to speaker.

I need to extract speaker-dependent features.

There were similar experiments, if you know answer or any analogy, please guide me. I hope you know the answer!

asked Dec 10 '14 at 16:08

Dawid%20Smolen's gravatar image

Dawid Smolen
1111


One Answer:

Isn't there anyone that knows the answer, or rather there is no answer because of my poor english? (it's not my language...)

answered Dec 13 '14 at 08:36

Dawid%20Smolen's gravatar image

Dawid Smolen
1111

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.