|
I have a small list of phrases, each fairly short, that I would like to detect in recorded, fairly noise-free conversations (i.e detect the presence or absence of these phrases - "did the speaker say 'Chicken McNuggets'?"). Is this problem easier than general speech recognition? Does anyone know any decent implementations or survey papers? Even the name of the problem would be helpful. I was thinking of just recording myself saying the target phrases and then crosscorrelating that with each sample - is that reasonable? |
|
This is keyword spotting in speech recognition. As far as I know, the best keyword spotting systems do full speech recognition first, and then detect keywords from a set of transcription hypotheses (typically represented in a 'lattice'). You should be able to find lots of papers on this in ICASSP and Interspeech proceedings. Either way you'll likely need an HMM system to detect the words. A sliding window classifier approach will perform much worse. If you instead assume that the speaker will only/mostly being saying words from a small set, you can treat it as speech recognition with a restricted grammar. In this case building a full recognizer yourself may not be too painful. You could probably do either of these approaches with the Kaldi open source recognizer: kaldi.sourceforge.net |