The question is rather self-explanatory - I have a large number of phone calls to businesses that I would like to classify. The set of possible classifications is rather small, but might increase later. The transcriptions are rather poor, since we can't train the acoustic models to individuals.

So I guess what am I asking for is some helpful papers, or even just guidelines on how I might want to adapt existing, "ordinary" text classifiers to account for the higher level of noise in my data.

asked Jul 23 '10 at 18:45

george%20s's gravatar image

george s
517810


4 Answers:

If your input is noisy I'd suggest to try letter-level ngrams as features. You could play around with different orders (e.g. 2-5) and even mix them and see how it works. The classifier could be svm with linear kernel or sgd classifier.

A phonetic normalization (like metaphone) could work either but it depends on your data.

Some advanced approaches like Paul Dixon suggest could work as well (I mean something like string kernels with SVMs). But it's worth to try the simplest approaches first.

answered Jul 04 '13 at 06:06

Konstantin's gravatar image

Konstantin
34181218

You could try generating recognition lattices and the rational kernel approach used in this paper http://www2.research.att.com/~haffner/biblio/pdf/cortes-02.pdf

answered Jul 03 '13 at 21:52

PaulDixon's gravatar image

PaulDixon
14623

One thing you can do is a bit of degrading of your inputs. For example, if "t"s and "d"s are usually confused by your transcribing software, replace both of them by an arbitrary symbol. In the same way, if a letter is usually dropped (say a mute "g" in the end of a word) you can remove it from other places where it appears. If some words are mistaken, use a single feature for them, etc. I'm not sure that transfer learning is the way to go, since you have corrupted features, and you would have to find a larger data set of labeled examples for your specific problems, which are not always available.

answered Jul 23 '10 at 20:05

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

edited Jul 23 '10 at 20:07

It will help if you have a good prior. As prior you can use a model trained on clean data (normal text), and regularize your features not to deviate too far from that. Use strict regularization (because you don't want to model the actual noisy observations too closely). Also, use a classifier combination rather than just one classifier, have lots of training data, maybe use some heuristic outlier detection to throw outliers out or downweigh them appropriately.

answered Jul 23 '10 at 18:57

Frank's gravatar image

Frank
1349274453

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.