• Is there any work about training online many instances of the same classifier simultaneously with different parameter settings, in order to have a better classification accuracy ? It is somehow like ensemble methods where however different classifiers are trained instead of the same classifier with different parameter settings (which are trained online).
  • What are the challenges that one can consider to resolve in this context ?
  • Suppose that we have have a classifier trained with 3 different parameter settings (thus it gives rise to three models M1 M2 M3). If a data-point x is classified as class c1 in model M1, class c1 in model M2 and class c2 in model M3. The most likely class of x here is c1 (maybe), but is there any method that allow us to be more confident about that by computing how likely it is that x is classified as c1 in model M3, and how likely x is classified as c2 in models M1 and M2 (given that c was classified as c1 in M1 and M2, and classified as c2 in M3) ? I don't know if this can help to better classify the data-point x.

asked Dec 11 '12 at 15:54

shn's gravatar image

shn
462414759

1

Don't know about online, but this is similar to bagging (bootstrap aggregation), where you have multiple "versions" of the same model trained in bootstrapped samples of the data (sampled with replacement).

(Dec 11 '12 at 17:12) digdug

@digdug the multiple versions are trained on the same set of data or the datast should be split among them ?

(Dec 11 '12 at 17:58) shn

In bagging, you create a new version of the data using sampling by replacement (each bagged dataset is same size as original since some samples are repeated more than once), so each model is trained on its own slightly different version of the data.

(Dec 11 '12 at 18:58) digdug

@digdug When you say "create a new version of the dataset using sampling by replacement", do you mean just to replace some randomly chosen data-points by some other data-points randomly chosen from the original dataset ? If yes, we will have duplicated data-points in the new version, and duplicating data-points will not really help the learning process !

(Dec 12 '12 at 05:14) shn

Yup, duplicating some samples. The bootstrap and bagging are accepted statistical methods, check out Ch 7 and 8 in The Elements of Statistical Learning (http://www-stat.stanford.edu/~hastie/Papers/ESLII.pdf).

(Dec 12 '12 at 19:05) digdug

What you're describing is much more than "somehow like ensemble methods". I'd say it's closer to "exactly like ensemble methods". You are constructing an ensemble of (hopefully only weakly correlated) classifiers, and then using some combination function (max, mean, min, logistic regression, etc.) to fuse their predictions. While some ensemble methods use a common classifier trained with different data subsets (bagging), or feature subsets (boosting) or totally distinct classification algorithms, there's nothing different about what you're describing here.

(Dec 14 '12 at 17:15) Andrew Rosenberg

@AndrewRosenberg what about the online configuration that I'm talking about.

(Dec 14 '12 at 21:38) shn

well presumably if each of your ensemble members can be trained online, the ensemble can. if your combination classifier (rather than using a static combination function) can be trained online, I don't see why online training would pose any problems.

(Dec 17 '12 at 13:08) Andrew Rosenberg

@AndrewRosenberg for instance, since it is online (like a data-stream where data-points are considered one by one), we can not do random split of the data to do bagging or boosting in order to increase the diversity between the ensemble members. What is the best way to do that in an online configuration

(Dec 19 '12 at 18:08) shn

How about random sampling, so that each classifier reads or doesn't read each sample with some probability, which will give you some samples being assigned to some classifiers and not to others?

(Dec 19 '12 at 18:24) digdug
showing 5 of 10 show all

One Answer:

It seems like you got points 1 and 2 clarified already by digdug. For point 3, you could let M_i be probabilistic, for example logistic regression models. Then you could let the mean p(c_j|x) ~ sum_i p_i(c_j|x) be the probability of class c_j according to the ensemble. If you have held-out data you could compute the probability of being correct for each model and use that to weight them non-uniformly. If you have structured outputs this may be a little bit more involved, since the way you would combine the models will depend on the inference algorithm used.

answered Dec 14 '12 at 04:17

Oscar%20T%C3%A4ckstr%C3%B6m's gravatar image

Oscar Täckström
2039133450

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.