I have a small list of phrases, each fairly short, that I would like to detect in recorded, fairly noise-free conversations (i.e detect the presence or absence of these phrases - "did the speaker say 'Chicken McNuggets'?"). Is this problem easier than general speech recognition? Does anyone know any decent implementations or survey papers? Even the name of the problem would be helpful. I was thinking of just recording myself saying the target phrases and then crosscorrelating that with each sample - is that reasonable?

asked Apr 01 '13 at 16:17

george%20s's gravatar image

george s
517810


2 Answers:

This is keyword spotting in speech recognition. As far as I know, the best keyword spotting systems do full speech recognition first, and then detect keywords from a set of transcription hypotheses (typically represented in a 'lattice'). You should be able to find lots of papers on this in ICASSP and Interspeech proceedings. Either way you'll likely need an HMM system to detect the words. A sliding window classifier approach will perform much worse.

If you instead assume that the speaker will only/mostly being saying words from a small set, you can treat it as speech recognition with a restricted grammar. In this case building a full recognizer yourself may not be too painful. You could probably do either of these approaches with the Kaldi open source recognizer: kaldi.sourceforge.net

answered Apr 13 '13 at 17:24

Andrew%20Maas's gravatar image

Andrew Maas
16113

have you looked at the microsoft speech sdk?

answered Apr 01 '13 at 21:51

SeanV's gravatar image

SeanV
33629

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.