I'm trying to do OCR for multiple languages using convolutional neural networks. I've seen and heard of convolutional neural networks being used on the MNIST dataset and also objects. I (sort of) understand that CNN's can be used with large images and images of variable size to recognize objects from features.

I'm not sure how I would handle multi-lined continuous text using a CNN. MNIST is normalized with only one character and the test data is also. Here are a few issues that I have on my mind:

  • In regards to training: Do I have to train on single character images like in MNIST? Would it be possible to train on word images, or even an entire page?
  • Text is of variable size and has a very defined structure(line after line), so I'm not sure how CNN's would handle the zoom effect and adapt to recognizing lines of variable size such as if the entire image is two giant "A"'s.
  • Sort of related to the second, how would I find the right "window"?

asked Feb 16 '14 at 11:49

John%20King's gravatar image

John King
1111

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.