|
Hello everyone, I have a dataset with noisy binary labels(about 80% correct), the task is to rank the reliability of these labels. The solution I thought of is to train logistic regression on the data-set and use posterior probability to rank the labels.Since convex loss function like logistic loss may not be robust to noisy labels, I also tried t-logistic regression, which is a nonconvex variant of logistic regression. However they both did not work well. Then I found out that by simply weighting each sample using the inverse of its leverage value during the training process of logistic regression, the ranking performance can be greatly improved.
So I wonder is there any more established way to do this kind of weighting in logistic regression, which can reduce the influence of outliers? Or there are other solutions to the problem I described? Thanks in advance! Lin Zhu |
I would try different classifiers such as gbm or random forest (both available in R) and see how their accuracy compares to logistic regression.