Given sufficient data, CNNs converge to a minimum which is as good as global with local optimisation (i.e. regardless of weight initialisations aka unsupervised pretraining). This is an empirical result, notably found here. But why?

I asked this to Yann LeCun at a conference a few weeks ago, and if I remember his answer correctly, it can be shown that the error surface of CNNs has its extrema bunched up in the same area, and they have roughly the same values - it involves random matrix theory and polynomials.

I can't find any papers about this - could anyone point to any? 

asked Jun 23 '14 at 09:44

Holden%20Caulfield's gravatar image

Holden Caulfield
5224

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.