8
5

It seems that some common languages in these fields are Lisp and Python.

However, I was wondering what it is about Python that makes it used consistently when dealing with AI-related projects? I found it interesting that it gets mentioned more than C, C++, or Java, (from merely my personal observations). If it has to do with the fact that it is high-level code, I still don't see why it's used more than Java when Java seems to be much more pervasive in general.

So, is there a set of well thought-out reasons why Python has been supposedly used more consistently, or is it just something that's tradition? (By the way, is Lisp just a tradition, too?)

asked Dec 28 '11 at 20:02

Kaitlyn%20McMordie's gravatar image

Kaitlyn McMordie
2035912

edited Dec 28 '11 at 20:04

Forgot to mention, Lua seems common too!

(Dec 28 '11 at 20:05) Kaitlyn McMordie

Yes, if someone could mention something about Lua as well that would really be great

(Dec 30 '11 at 16:08) gangsta

4 Answers:
17

Yes, there are many lists of 'why python' out there. There are probably fewer lists of 'why python for ML' but they surely do exist as well. Here is my reasoning:

  1. Ecosystem. Over the last number of years python has built up a huge ecosystem of scientific libraries and people around it. This has to be my number one reason for reaching for python. There are many other scientific platforms (MATLAB, Mathematica, R, ...) that boast similar resources, but in the field of ML I have never found the alternatives to be quite as extensive or easy to adopt/integrate on the fly.

  2. Attitude. I have found that the python community promotes a certain attitude towards coding that tends to produce high quality code that is easy to use. One example I sometimes bring up to illustrate this point is the Insight Toolkit. It's a/the library for image processing, but I've found it suffers from a major 'flaw'. While it's a very comprehensive library and it does its job very well, the C++ templated architecture makes it extremely cumbersome to just jump into, especially for new users. So here it had the 'high quality' bit but not the 'easy to use' bit, and personally this stacks up as one of my biggest reasons to not use a library.

  3. People. Python has a somewhat unique user base. There are tons of scientific/academic people using python for their research, but there are also many, many people from a coding background working in industry. I find the mix to be just right. In fields like ML, industry tends to lag academia, but you still see some of the newest stuff showing up in nice python packages because of those academics. This stuff is often applied by industry people where it is worked into something complete and stable, something industry demands much more than academia. I've found this to be a huge advantage over the platforms mostly used in academia, like MATLAB, where you have an enormous amount of code created, but it's largely undocumented and poorly written because most people coding in academia aren't very concerned with the actual package past using it to verify their research results.

  4. Speed. Perhaps python doesn't strike you as a speedy language, but it's been shown time and time again to be very competitive performance wise. The beautiful thing is that you can keep your high-level coding for the portions of your code that are still changing a lot, and use tools like Cython to create quick C-extensions where performance is key and you've figured out what your'e doing. Additionally the steps necessary to code on GPUs and parallel architectures have largely been abstracted for you, making it WAY easier (if you want to find out what I mean go to bare-bones CUDA programming for a while) to implement the necessary speedups. This last point can be huge, because often in ML your training procedures can take, say, a day on a GPU board and 3 weeks on a CPU. Three weeks is a long time to wait to figure out you've done something stupid.

Those are the main things (aside from my personal preference of coding in python) that make me reach for python. Regarding the Python vs. C/C++/Java debate, that's not something anybody is going to be able to settle here in this thread ;-)

However, regarding your comment that java is generally more pervasive. Java is huge in enterprise applications, partly because of existing frameworks and partly out of tradition, and it is a very commonly known language since it is often taught at the introductory university/college level. That makes it very pervasive in those areas. There is a whole world of high-level programming languages out there and Java is just one option. I could go on for pages (indeed, I nearly already have) but suffice it to say that as far as "high level" goes, the way python is used is generally "higher level" than Java.

answered Dec 28 '11 at 22:57

Kyzyl%20Herzog's gravatar image

Kyzyl Herzog
371144

3

Regarding your first point, I think the fact that Matlab and Mathematica aren't free also plays a significant role. A lot my colleagues have made or are currently making the switch from Matlab, partly because the limited number of floating licenses was becoming a problem (at least afaik - I only joined the lab last year so I started in Python straight away).

Numpy and Scipy (and to a lesser extent, matplotlib) also make this transition a little less painful. But certainly not painless :)

(Dec 29 '11 at 04:17) Sander Dieleman

Indeed, the transition is never completely painless, but in this case I think it's the right thing to do.

You're right about the licenses, I guess I left out the whole money issue in my reply. That certainly plays a big role. I couldn't think of a situation where one would prefer to use MATLAB over python if the choice was pay and use MATLAB or use python for free.

Another point I could add is that I've had some issues when programming with commercial packages (particularly matlab) where their newest versions (often the version available on a site license) breaks things in their old version. I found the GUI toolkit in matlab particularly bad for this, and for the equivalent when you change platforms (i.e. windows-->linux).

The unfortunate situation in research right now is that because a huge chunk of the actual programming that happens is done by grad students and undergrads, and because they are typically taught MATLAB, R, or one of many things that are not python, there is a lack of available skill. So even if there is a desire to switch to python, often the choice is to stick to what the lab knows students will be able to work with.

(Dec 29 '11 at 13:35) Kyzyl Herzog

We're still using Matlab for teaching for that reason. At least a reasonable majority of the students has some prior experience with it, which can't be said about Python. Not yet, anyway...

(Dec 29 '11 at 14:09) Sander Dieleman
1

Why don't you make them use Python? By that you are solving that problem and numpy/scipy is easy to grasp for someone who has used matlab.

(Dec 30 '11 at 14:05) Justin Bayer

What's the difference between a package and a library? Is a library basically a specific subset of what is usually referred to by the term 'package'?

(Dec 30 '11 at 15:35) gangsta

A package (sth like a directory) and a module (a single python file) can both be a library. A library might as well contain additional files.

(Jan 01 '12 at 15:02) Justin Bayer

@justin bayer: Often forcing someone to use a technology they aren't comfortable with results in a big drop in productivity. For many research positions that are short-term that might mean the difference between getting the work to a publishable result and just sinking their teeth into it.

@S Jacinto: I don't think there is an official answer to your question, but often people refer to 'libraries' as bundles of code that you would use in your own code, and 'packages' as bundles of libraries + some sort of interface. For example, I might call libsvm a C++ library for implementing Support Vector Machines, and R a package for doing statistical analysis (as it is commonly used). That said, th e words are largely interchangeable.

(Jan 04 '12 at 22:07) Kyzyl Herzog
showing 5 of 7 show all

I think Kyzyl Herzog's answer covers nearly all you can say on the subject. However, there are a few points I would add.

  1. Java (IMHO) isn't the best for ML for several reasons:

a. It is difficult to create extensions to Java. The creators of the language did this intentionally, and if you've ever had to write JNI, you will know just how difficult they made it. This is a particular issue in ML when you want to be able to rely on BLAS libraries or GPU programming for speed. While Java libraries do exist to provide such functionality, they are much more of a pain to setup, use, and maintain. I've come across many old Java bindings that were maintained for only short durations of time due to the hassle it is to deal with either JNI or SWIG.

b. Java doesn't provide operator overloading. ML often requires a decent amount of math and anything that obscures the equations will make it that much more difficult to debug.

c. Java's insistence of being OOP-only. Even C++ is supposed to be multi-paradigmed. Some algorithms just work better written functionally. It's not that you can't write them in Java, it's just more of a pain.

Comparatively, Python makes it:

  1. Easy to write extensions. You can: use Boost::Python, use SWIG, easily write by hand, use Cython, etc. to create an extension. This means that people can get mature libraries to the public fairly quickly Numeric/Numpy have been around forever now. They use BLAS to get very fast speeds with little penalty for crossing over to C/C++.

  2. Numeric/Numpy use operator overloading to get Matlab like syntax. Matplotlib builds on these libraries to get an environment with almost the same look and feel as Matlab. Additionally, libraries such as OpenOpt and Theano go one step further and allow you to write systems of equations whose derivatives will automatically be discovered. It's hard for me to imagine a way of easily doing this in Java.

  3. Python has a general policy of do whatever you want. It allows both OOP and functional designs to be easily implemented.

I think the biggest reason for using Python over Matlab is that Python is a general programming language. Matlab's syntax is designed specifically for working with linear algebra. While this is nice, it means that doing things that aren't LA can sometimes be difficult. Python on the other hand was built with being general in mind.

C/C++ can be difficult due to the requirement of recompiling the code every time you make a small change. Config files can save you some pain, but then you have to deal with configuration files, allowing random parameters to be set, etc. This is where Python can be a major asset. Since it's so easy to create a Python library in C/C++, you can still keep your main project in C/C++, but then use Python to take the place of the configuration file.

I used a similar strategy for my dissertation. If I needed to run my code in N different configurations, I would just write several loops in Python and have it execute my C++ code.

With that said, Python is not problem free. I've found the requirement of indentation to be a major hassle -- especially when refactoring. The weak typing making debugging that more difficult. If you consider the power you get from IDEs such as Netbeans or Eclipse you can quickly gain an appreciation for strong-typing.

answered Dec 29 '11 at 15:55

nop's gravatar image

nop
2414712

I have to disagree with "python let's you do whatever you want". It is true that python is a mixture of object oriented programming, functional programming and imperative programming. However, it is quite clear, when to use which idiom. The "Zen of python" even says "There should be one-- and preferably only one --obvious way to do it."

If you have a look around at successfull open source projects, you will see that most of them are written in "pythonic" code, which is the right way to do things (TM).

I don't want to start a language war (that's sooo 2004 and there is a vast amount of blog posts on this), but the points you list as Python's weaknesses are considered merits by most.

(That being said, the rest of your answer is fine. Have an upvote.)

(Dec 30 '11 at 14:04) Justin Bayer

Justin Bayer - "If you have a look around at successfull open source projects, you will see that most of them are written in 'pythonic' code, which is the right way to do things (TM)." - why do you mean by this, and what is "pythonic" code?

(Dec 30 '11 at 16:04) gangsta
2

Pythonic code is code that follows the Zen of Python:

>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
(Jan 02 '12 at 19:08) ogrisel
1

"If you consider the power you get from IDEs such as Netbeans or Eclipse you can quickly gain an appreciation for strong-typing."

  1. Eclipse has a very nice plugin called pydev. I've used it a lot. There are also some dedicated IDEs like Eric (http://eric-ide.python-projects.org/) that interface very nicely with the debugger python ships with.
  2. Python is strongly typed! There are no implicit type conversions when you attempt to operate on objects with types that disagree, you will get an error. What python is not is statically typed, but that means you don't need to declare your type explicitly, and that it can change, provided the proper procedures for doing so are followed. The interpreter does keep track of all the relevant type information.
(Jan 04 '12 at 21:56) Kyzyl Herzog

@Justin: That's fair. Perhaps I should stick with: Python allows for a multi-paradigm methodology to be used. I'm very happy lambdas exist and am currently looking to use 'map' from the multiprocessing module. I also still appreciate that Python doesn't stop you from doing unpythonic things. You'll just pay a penalty at some point in your life.

(Jan 05 '12 at 10:36) nop

@Kyzyl: I have used PyDev, and it's not bad, but it really can't do everything that Eclipse+Java can do. For example, auto-complete can't really function (for obvious reasons), and the refactoring tools are also more limited.

Yes, you are correct, I should have written 'statically typed'. While difficult, if Python could add optional static typing (http://www.artima.com/weblogs/viewpost.jsp?thread=85551), I think the gain in what an IDE could provide would be well worth it.

And curly braces. I know tons of people love the indentation thing (and I do always indent anyways), but to be able to optionally use curly braces instead would make me very happy.

(Jan 05 '12 at 10:48) nop
showing 5 of 6 show all

First, I don't think that LISP is being used in AI a lot. It was something the AI language before the AI winter, where people thought that rule based systems will lead to intelligent machines one day.

Here is my opinion why Python is relevant for machine learning:

  1. Numpy/Scipy/Matplotlib give you 90% of the operations you need for ML at a reasonable speed. Other dynamic languages like Ruby and Perl don't have so strong numerical libraries. Also, gnumpy gives you GPU power.
  2. The time it takes you to try out an idea is development time + running time. C/C++ (and Java, Go, C#, ...)might have drastically shorter runtimes, but much longer development times.
  3. Python is a general purpose language, which sets it aside from R and Matlab. Want to write a webservice for your colleagues? Yes. Want to attach your stuff to a production environment via zeromq? Dead easy. Want to use that new hipster logging module someone posted on reddit? Go for it.
  4. Python is easy.

Also, Python is e.g. not that good for robotics. Although you might argue that ML and robotics should be similar in what they require for a programming language, roboticists are using lots of proprietary matlab software like simulink. They also have a bigger need for some datastructures at which Python is not good at (e.g. Quadtrees) and the combination of this with real time rquirements drives people towards C.

Summary: only rapid development capable language that has strong enough numerical capabilities.

answered Dec 30 '11 at 14:15

Justin%20Bayer's gravatar image

Justin Bayer
170693045

Sorry but what do you mean by "dynamic language" in the context of Perl, Python, and Ruby?

(Dec 30 '11 at 14:50) gangsta

All three languages are dynamically typed as opposed to statically typed (different to the weakly/strongly typed distinction). This means that the types of variables do not have to be determined at compile time. This has several benefits (easier to program, introspection at runtime, meta programming, ...) but also several downsides (typically slower, no static code analysis).

(Dec 30 '11 at 15:15) Justin Bayer

Java's downsides can be overcome with Clojure. I am very happy with Weka and Clojure.

answered Jan 04 '12 at 20:38

Melipone%20Moody's gravatar image

Melipone Moody
221468

1

Many of the technical issues one faces when using 'vanilla' Java can be overcome, and the same goes for most languages/environments. However as is stated numerous times above, the technical issues are only part of what makes a platform great to develop with. It goes all the way to the community and individuals who make use of it.

(Jan 04 '12 at 21:39) Kyzyl Herzog
1

Scala is another very nice language that also runs on the JVM. It's has all the functional-programming bells and whistles and I think it may play nicer with vanilla Java than Clojure does.

But the biggest issue I still see is trying to use anything outside the JVM world.

(Jan 05 '12 at 10:52) nop
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.