|
I have used python for some specific machine learning problems, and I have found that usually python is a lot slower than C/C++. While using theano, how much speed will it catch up when competing with C/C++? And if my goal is to do scientific experiments or competitions, in what cases should I use python(theano) and in which case I should use C/C++? |
|
I would second most of what Sander has said, but would add some details, namely:
I didn't really consider that setting up all this stuff on a Windows platform can be a little more involved, good catch. Linux is probably the better platform for this kind of work at the moment, especially when using Theano. Thanks for mentioning Anaconda, I'd heard the name before but I didn't know what it is. Looks like a good alternative to EPD, I'm going to give it a try since I can't get Canopy to work on my machine. I fixed the links in your post as well :) EDIT: it would actually be very interesting to see a more in-depth comparison between EPD and Anaconda at this point, maybe also with some performance info. All I've found so far is this: http://stackoverflow.com/questions/15762943/anaconda-vs-epd-enthought-vs-manual-installation-of-python
(May 08 '13 at 06:44)
Sander Dieleman
|
|
I never really code C/C++ myself so I can't compare, but I've been very satisfied with Theano's performance so far. I'm sure writing optimised C code yourself can't be beaten in terms of performance, but it does take a lot more time and effort :) Writing Theano code is only a little bit harder than using numpy (you just have to wrap your head around the symbolic variables and expressions, after that it's a breeze). Especially if you have an nvidia GPU, using Theano can result in huge speed gains. I can't give you any numbers, sorry. In practice, Theano might even be faster than writing the code yourself sometimes, since it may discover optimisations that you missed. In general, when using Python for ML, I strongly recommend using the Enthought Python Distribution, which is free for academic use. It contains an optimised version of numpy compiled with the Intel MKL blas library. EPD's Numpy is much, much faster compared to a vanilla install (I've seen speed gains between 10x and 100x). It seems like they're overhauling it at the moment though, apparently it's now called 'Canopy'. I haven't been able to get the new version installed correctly, so using an older version might be preferable. And finally, if you're writing code that numpy / Theano can't speed up, consider having a look at Cython, which lets you compile your Python code to C, and defines some extra syntax for things like static typing etc, to further speed up the code. I've found this much less daunting than switching to C/C++ entirely, but I guess that could be different for you if you are more familiar with those languages. |
|
If using a CPU you should be able to attain the same kind of speeds from Theano as compared to hand crafted mathematical c code. If you are using GPU's life is reasonably easy with Theano, with an excellent associated increase in runtime speed. Theano lets you code in python, and makes life easy for use with GPUs. Some links: http://deeplearning.net/software/theano/introduction.html http://gpuscience.com/articles/theano-a-cpu-and-gpu-math-compiler-in-python/ |