|
How do I find out homophones of any given word? Are there any papers/libraries for doing this? ex: cell ~ sell, hide ~ hyde etc. |
|
This is a harder problem than you might think. It shows up a lot in translating names or cross-lingual named entity recognition. (Qadaffi has a lot of accepted spellings in English. Schwarzenegger has a ton in Arabic.) If you only care about dictionary words, you can use a pronunciation dictionary, and find entries with the same phone string. If you want to use free text and be able to say things like "Foobar" is a homonym of "Fubahr", then a simple solution is to use soundex. (http://en.wikipedia.org/wiki/Soundex) But there is a lot of research on identifying how words are pronounced based on their surface form and these are two very basic introductory approaches. Some other work is here from Google folks and here from Alan Black etc. |
|
You can use a speech synthesis software to convert words into phonemes, and then choose identical/very similar phonemes. Look at chapter 8 of Jurafsky and Martin for an introduction, or find a specific synthesis software you want to use to determine the similarity. Be aware that you might need to disambiguate between different phonetizations of the same spelled-out word, like "read" in "I'll read this book" and "I've read this book" (in which the first is a homophone to reed and the second to red). |