I am wondering if there has been any research into how to incorporate our confidence in each training data into the machine learning model. Specifically I have a bunch of training data for each I know how reliable they are

X1: 95%
X2: 70%
X3: 10%

This means that there is 95% probability that the label for X1 is "True" and 5% chance that it is "False". Similarly X2 is True with the probability of 70% and False with 30% probability. And finally the probability of X3 having label "True" is only 90%. Note that these are training data.

I am using a random forest classification model and training on this data. Is there any trick for me to use the confidence to do a better training?

I looked for research papers but unfortunately could not find anything relate to this problem.

asked Aug 06 at 20:51

Alex%20Trebeck's gravatar image

Alex Trebeck
1111

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.