showing 5 of 10
show all
|
|
It seems like you got points 1 and 2 clarified already by digdug. For point 3, you could let M_i be probabilistic, for example logistic regression models. Then you could let the mean p(c_j|x) ~ sum_i p_i(c_j|x) be the probability of class c_j according to the ensemble. If you have held-out data you could compute the probability of being correct for each model and use that to weight them non-uniformly. If you have structured outputs this may be a little bit more involved, since the way you would combine the models will depend on the inference algorithm used. |
Don't know about online, but this is similar to bagging (bootstrap aggregation), where you have multiple "versions" of the same model trained in bootstrapped samples of the data (sampled with replacement).
@digdug the multiple versions are trained on the same set of data or the datast should be split among them ?
In bagging, you create a new version of the data using sampling by replacement (each bagged dataset is same size as original since some samples are repeated more than once), so each model is trained on its own slightly different version of the data.
@digdug When you say "create a new version of the dataset using sampling by replacement", do you mean just to replace some randomly chosen data-points by some other data-points randomly chosen from the original dataset ? If yes, we will have duplicated data-points in the new version, and duplicating data-points will not really help the learning process !
Yup, duplicating some samples. The bootstrap and bagging are accepted statistical methods, check out Ch 7 and 8 in The Elements of Statistical Learning (http://www-stat.stanford.edu/~hastie/Papers/ESLII.pdf).
What you're describing is much more than "somehow like ensemble methods". I'd say it's closer to "exactly like ensemble methods". You are constructing an ensemble of (hopefully only weakly correlated) classifiers, and then using some combination function (max, mean, min, logistic regression, etc.) to fuse their predictions. While some ensemble methods use a common classifier trained with different data subsets (bagging), or feature subsets (boosting) or totally distinct classification algorithms, there's nothing different about what you're describing here.
@AndrewRosenberg what about the online configuration that I'm talking about.
well presumably if each of your ensemble members can be trained online, the ensemble can. if your combination classifier (rather than using a static combination function) can be trained online, I don't see why online training would pose any problems.
@AndrewRosenberg for instance, since it is online (like a data-stream where data-points are considered one by one), we can not do random split of the data to do bagging or boosting in order to increase the diversity between the ensemble members. What is the best way to do that in an online configuration
How about random sampling, so that each classifier reads or doesn't read each sample with some probability, which will give you some samples being assigned to some classifiers and not to others?