|
Co-training with two classifiers trained using separate feature sets of the same data requires the feature sets to be conditionally independent given the labels. Why is this assumption necessary? What if it gets violated? Can I still use co-training if the both feature sets are not conditionally independent? |
|
This assumption is necessary for the analysis. Intuitively, if they are not independent, the decisions made by the two classifiers are not independent, so you shouldn't be able to treat them as such to derive some confidence measure on the true labeling of the points for the semi supervised learning to necessarily help (and not just reinforce the bias of the classifiers). However, you can apply co-training without this independence, only it might not work that well (it will work more like a bootstrap method (as in an old semi-supervised technique common in NLP, where you start with a baseline labeling and gradually expand it using training classifiers, not in the Efron sense)), which is not as good as co-training but sometimes works. |
|
The princple of co-training is using the independent of the pair of classifier trained by the response independent data set to help each other to get more labeled data. If the two data sets are not independent,the training approach is like self-training methods. The more independent of the both data sets, the better performance we will get. |