|
I am doing binary classification on set which is very noisy, eg.: positive samples are very close to negative samples (or sometimes are identical). For sure I cannot use linear classifiers like linear regression. What can I do with such noisy data? I cannot cleanse it because noise comes from nature of data: positive samples are trying to looks like negative samples (this is some kind of fraud detection problem). Changing feature set does not help too much. Maybe use some clever algorithm? |
|
I do not get why would you use linear regression to classify data, don't you mean logistic regression? One way to do it, may be using Gaussian Processes (or a Bayesian approach), where you assume that your data comes from a noisy environment, and you control the standard deviation of your source when you do the Bayesian Modeling. If you have Bishop's Book, chapter 6.4, equation 6.57 models the labels as an input with a certain degree of noise, you can add that in your likelihood and play with that parameter and see how that affects your classification results. |