|
The classical uniform convergence bounds for binary classification in statistical learning theory rely on the union bound and Hoeffding's inequality, where the latter is a bound on the divergence of the empirical mean from the true mean of a set of iid Bernoulli trials. So basically these are based on confidence intervals for the maximum likelihood estimate of the generalization error. But why would we think that the ML estimator is the best estimator in this case? Couldn't we use Bayesian inference instead and use for example the posterior of the beta distribution with a non-informative prior to derive credible intervals on which to apply the union bound? Would something like this make sense at all? |