|
The question is rather self-explanatory - I have a large number of phone calls to businesses that I would like to classify. The set of possible classifications is rather small, but might increase later. The transcriptions are rather poor, since we can't train the acoustic models to individuals. So I guess what am I asking for is some helpful papers, or even just guidelines on how I might want to adapt existing, "ordinary" text classifiers to account for the higher level of noise in my data. |
|
If your input is noisy I'd suggest to try letter-level ngrams as features. You could play around with different orders (e.g. 2-5) and even mix them and see how it works. The classifier could be svm with linear kernel or sgd classifier. A phonetic normalization (like metaphone) could work either but it depends on your data. Some advanced approaches like Paul Dixon suggest could work as well (I mean something like string kernels with SVMs). But it's worth to try the simplest approaches first. |
|
You could try generating recognition lattices and the rational kernel approach used in this paper http://www2.research.att.com/~haffner/biblio/pdf/cortes-02.pdf |
|
One thing you can do is a bit of degrading of your inputs. For example, if "t"s and "d"s are usually confused by your transcribing software, replace both of them by an arbitrary symbol. In the same way, if a letter is usually dropped (say a mute "g" in the end of a word) you can remove it from other places where it appears. If some words are mistaken, use a single feature for them, etc. I'm not sure that transfer learning is the way to go, since you have corrupted features, and you would have to find a larger data set of labeled examples for your specific problems, which are not always available. |
|
It will help if you have a good prior. As prior you can use a model trained on clean data (normal text), and regularize your features not to deviate too far from that. Use strict regularization (because you don't want to model the actual noisy observations too closely). Also, use a classifier combination rather than just one classifier, have lots of training data, maybe use some heuristic outlier detection to throw outliers out or downweigh them appropriately. |