a large scale implementation would be more interesting if possible

asked May 17 '11 at 20:53

Alexis%20Pribula's gravatar image

Alexis Pribula

5 Answers:

There used to be a huge bank check recognition program made by many of the top researchers in deep learning (for example, Yann LeCun's wikipedia page mentions this, and you can find references in videolectures as well), but I'm not sure if it is still in production and if not what replaced it.

answered May 18 '11 at 02:38

Alexandre%20Passos's gravatar image

Alexandre Passos ♦

The bank check recognition Alexandre mentioned is also the only example I know about but I wouldn't be surprised if there were more large scale applications of convolutional networks. However, stacked auto-encoders and deep belief networks are such recent developments that they probably are not used for commercial applications yet. Deep learning actually still needs some time to prove itself.

answered May 18 '11 at 03:39

Philemon%20Brakel's gravatar image

Philemon Brakel

Google uses convolutional nets for obfuscation of license plates and faces in street view (paper here). I know there were neural networks used for zip code recognition and stamp recognition - but that was "the early days" work and some of that was implemented in hardware. Otherwise I'll agree with Philemon that it's probably to early to have pretraining stuff used in the industry. IMHO it hasn't really proved useful yet anyway ;)

answered May 18 '11 at 06:23

Andreas%20Mueller's gravatar image

Andreas Mueller

One criticism that I've been hearing about deep learning is the training doesn't seem to scale well. Using MCMC is a problem when production datasets are often extremely large.

Edit: Correction above, deep learning covers a much wider field than I originally thought. There are many scalable methods as Alex points out.

answered May 18 '11 at 13:54

crdrn's gravatar image


edited May 18 '11 at 15:38


Most scalable deep learning methods (see Hessian-Free, or stacked denoising auto-encoders) don't use MCMC at all for inference.

(May 18 '11 at 15:11) Alexandre Passos ♦

Thanks for correcting that. Is there a simple overview of how hessian-free works? I'm having some trouble understanding Martens' original paper. From what I've tried to understand, it seems like a heavily modified version of conjugate gradient method.

(May 18 '11 at 15:42) crdrn

It's not really conjugate gradient, but it does use linear conjugate gradient as a subroutine. It is slightly similar to LBFGS in that it can use newton-like updates without ever keeping the full hessian in memory (hence hessian-free in the name) but the structure of the updates is very different, specially if you remember that LBFGS updates converge to an estimate of the hessian (hand-wavingly) while hessian-free updates can drift to track an evolving hessian over stochastic gradients and wildly changing curvature.

The paper is indeed hard to read, specially as it makes little distinction between the essential structure of the algorithm and the many tricks required to make it work. I tried implementing it once but ended up with something non-functioning because I couldn't quite figure out all the tricks.

(May 18 '11 at 15:51) Alexandre Passos ♦

I am happy to see I'm in good company Alexandre! I too tried to implement Martens' algorithm and ended up stuck due to its complexity.

Despite being a headache to implement, I believe Martens algorithm may point the way to some future end-to-end algorithm for deep learning that can compete with SGD. Perhaps like SGD the "chaff" heuristics will be separated from the "wheat" and a simplified algorithm will emerge.

(May 20 '11 at 22:12) Jacob Jensen

I also tried to implement it and stopped due to the amount of tricks needed. It's really hard to write down.

(May 21 '11 at 05:07) Justin Bayer

J. Martens has made his code available for HF: http://www.cs.toronto.edu/~jmartens/research.html

(May 21 '11 at 06:53) osdf
showing 5 of 6 show all

answered Aug 11 '12 at 23:30

PaulDixon's gravatar image


Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.