The classical uniform convergence bounds for binary classification in statistical learning theory rely on the union bound and Hoeffding's inequality, where the latter is a bound on the divergence of the empirical mean from the true mean of a set of iid Bernoulli trials. So basically these are based on confidence intervals for the maximum likelihood estimate of the generalization error.

But why would we think that the ML estimator is the best estimator in this case? Couldn't we use Bayesian inference instead and use for example the posterior of the beta distribution with a non-informative prior to derive credible intervals on which to apply the union bound? Would something like this make sense at all?

asked Feb 09 '11 at 06:13

Oscar%20T%C3%A4ckstr%C3%B6m's gravatar image

Oscar Täckström
1459102743

edited Feb 10 '11 at 16:23

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.