|
It seems that some common languages in these fields are Lisp and Python. However, I was wondering what it is about Python that makes it used consistently when dealing with AI-related projects? I found it interesting that it gets mentioned more than C, C++, or Java, (from merely my personal observations). If it has to do with the fact that it is high-level code, I still don't see why it's used more than Java when Java seems to be much more pervasive in general. So, is there a set of well thought-out reasons why Python has been supposedly used more consistently, or is it just something that's tradition? (By the way, is Lisp just a tradition, too?) |
|
Yes, there are many lists of 'why python' out there. There are probably fewer lists of 'why python for ML' but they surely do exist as well. Here is my reasoning:
Those are the main things (aside from my personal preference of coding in python) that make me reach for python. Regarding the Python vs. C/C++/Java debate, that's not something anybody is going to be able to settle here in this thread ;-) However, regarding your comment that java is generally more pervasive. Java is huge in enterprise applications, partly because of existing frameworks and partly out of tradition, and it is a very commonly known language since it is often taught at the introductory university/college level. That makes it very pervasive in those areas. There is a whole world of high-level programming languages out there and Java is just one option. I could go on for pages (indeed, I nearly already have) but suffice it to say that as far as "high level" goes, the way python is used is generally "higher level" than Java. 3
Regarding your first point, I think the fact that Matlab and Mathematica aren't free also plays a significant role. A lot my colleagues have made or are currently making the switch from Matlab, partly because the limited number of floating licenses was becoming a problem (at least afaik - I only joined the lab last year so I started in Python straight away). Numpy and Scipy (and to a lesser extent, matplotlib) also make this transition a little less painful. But certainly not painless :)
(Dec 29 '11 at 04:17)
Sander Dieleman
Indeed, the transition is never completely painless, but in this case I think it's the right thing to do. You're right about the licenses, I guess I left out the whole money issue in my reply. That certainly plays a big role. I couldn't think of a situation where one would prefer to use MATLAB over python if the choice was pay and use MATLAB or use python for free. Another point I could add is that I've had some issues when programming with commercial packages (particularly matlab) where their newest versions (often the version available on a site license) breaks things in their old version. I found the GUI toolkit in matlab particularly bad for this, and for the equivalent when you change platforms (i.e. windows-->linux). The unfortunate situation in research right now is that because a huge chunk of the actual programming that happens is done by grad students and undergrads, and because they are typically taught MATLAB, R, or one of many things that are not python, there is a lack of available skill. So even if there is a desire to switch to python, often the choice is to stick to what the lab knows students will be able to work with.
(Dec 29 '11 at 13:35)
Kyzyl Herzog
We're still using Matlab for teaching for that reason. At least a reasonable majority of the students has some prior experience with it, which can't be said about Python. Not yet, anyway...
(Dec 29 '11 at 14:09)
Sander Dieleman
1
Why don't you make them use Python? By that you are solving that problem and numpy/scipy is easy to grasp for someone who has used matlab.
(Dec 30 '11 at 14:05)
Justin Bayer
What's the difference between a package and a library? Is a library basically a specific subset of what is usually referred to by the term 'package'?
(Dec 30 '11 at 15:35)
gangsta
A package (sth like a directory) and a module (a single python file) can both be a library. A library might as well contain additional files.
(Jan 01 '12 at 15:02)
Justin Bayer
@justin bayer: Often forcing someone to use a technology they aren't comfortable with results in a big drop in productivity. For many research positions that are short-term that might mean the difference between getting the work to a publishable result and just sinking their teeth into it. @S Jacinto: I don't think there is an official answer to your question, but often people refer to 'libraries' as bundles of code that you would use in your own code, and 'packages' as bundles of libraries + some sort of interface. For example, I might call libsvm a C++ library for implementing Support Vector Machines, and R a package for doing statistical analysis (as it is commonly used). That said, th e words are largely interchangeable.
(Jan 04 '12 at 22:07)
Kyzyl Herzog
showing 5 of 7
show all
|
|
I think Kyzyl Herzog's answer covers nearly all you can say on the subject. However, there are a few points I would add.
a. It is difficult to create extensions to Java. The creators of the language did this intentionally, and if you've ever had to write JNI, you will know just how difficult they made it. This is a particular issue in ML when you want to be able to rely on BLAS libraries or GPU programming for speed. While Java libraries do exist to provide such functionality, they are much more of a pain to setup, use, and maintain. I've come across many old Java bindings that were maintained for only short durations of time due to the hassle it is to deal with either JNI or SWIG. b. Java doesn't provide operator overloading. ML often requires a decent amount of math and anything that obscures the equations will make it that much more difficult to debug. c. Java's insistence of being OOP-only. Even C++ is supposed to be multi-paradigmed. Some algorithms just work better written functionally. It's not that you can't write them in Java, it's just more of a pain. Comparatively, Python makes it:
I think the biggest reason for using Python over Matlab is that Python is a general programming language. Matlab's syntax is designed specifically for working with linear algebra. While this is nice, it means that doing things that aren't LA can sometimes be difficult. Python on the other hand was built with being general in mind. C/C++ can be difficult due to the requirement of recompiling the code every time you make a small change. Config files can save you some pain, but then you have to deal with configuration files, allowing random parameters to be set, etc. This is where Python can be a major asset. Since it's so easy to create a Python library in C/C++, you can still keep your main project in C/C++, but then use Python to take the place of the configuration file. I used a similar strategy for my dissertation. If I needed to run my code in N different configurations, I would just write several loops in Python and have it execute my C++ code. With that said, Python is not problem free. I've found the requirement of indentation to be a major hassle -- especially when refactoring. The weak typing making debugging that more difficult. If you consider the power you get from IDEs such as Netbeans or Eclipse you can quickly gain an appreciation for strong-typing. I have to disagree with "python let's you do whatever you want". It is true that python is a mixture of object oriented programming, functional programming and imperative programming. However, it is quite clear, when to use which idiom. The "Zen of python" even says "There should be one-- and preferably only one --obvious way to do it." If you have a look around at successfull open source projects, you will see that most of them are written in "pythonic" code, which is the right way to do things (TM). I don't want to start a language war (that's sooo 2004 and there is a vast amount of blog posts on this), but the points you list as Python's weaknesses are considered merits by most. (That being said, the rest of your answer is fine. Have an upvote.)
(Dec 30 '11 at 14:04)
Justin Bayer
Justin Bayer - "If you have a look around at successfull open source projects, you will see that most of them are written in 'pythonic' code, which is the right way to do things (TM)." - why do you mean by this, and what is "pythonic" code?
(Dec 30 '11 at 16:04)
gangsta
2
Pythonic code is code that follows the Zen of Python:
(Jan 02 '12 at 19:08)
ogrisel
1
"If you consider the power you get from IDEs such as Netbeans or Eclipse you can quickly gain an appreciation for strong-typing."
(Jan 04 '12 at 21:56)
Kyzyl Herzog
@Justin: That's fair. Perhaps I should stick with: Python allows for a multi-paradigm methodology to be used. I'm very happy lambdas exist and am currently looking to use 'map' from the multiprocessing module. I also still appreciate that Python doesn't stop you from doing unpythonic things. You'll just pay a penalty at some point in your life.
(Jan 05 '12 at 10:36)
nop
@Kyzyl: I have used PyDev, and it's not bad, but it really can't do everything that Eclipse+Java can do. For example, auto-complete can't really function (for obvious reasons), and the refactoring tools are also more limited. Yes, you are correct, I should have written 'statically typed'. While difficult, if Python could add optional static typing (http://www.artima.com/weblogs/viewpost.jsp?thread=85551), I think the gain in what an IDE could provide would be well worth it. And curly braces. I know tons of people love the indentation thing (and I do always indent anyways), but to be able to optionally use curly braces instead would make me very happy.
(Jan 05 '12 at 10:48)
nop
showing 5 of 6
show all
|
|
First, I don't think that LISP is being used in AI a lot. It was something the AI language before the AI winter, where people thought that rule based systems will lead to intelligent machines one day. Here is my opinion why Python is relevant for machine learning:
Also, Python is e.g. not that good for robotics. Although you might argue that ML and robotics should be similar in what they require for a programming language, roboticists are using lots of proprietary matlab software like simulink. They also have a bigger need for some datastructures at which Python is not good at (e.g. Quadtrees) and the combination of this with real time rquirements drives people towards C. Summary: only rapid development capable language that has strong enough numerical capabilities. Sorry but what do you mean by "dynamic language" in the context of Perl, Python, and Ruby?
(Dec 30 '11 at 14:50)
gangsta
All three languages are dynamically typed as opposed to statically typed (different to the weakly/strongly typed distinction). This means that the types of variables do not have to be determined at compile time. This has several benefits (easier to program, introspection at runtime, meta programming, ...) but also several downsides (typically slower, no static code analysis).
(Dec 30 '11 at 15:15)
Justin Bayer
|
|
Java's downsides can be overcome with Clojure. I am very happy with Weka and Clojure. 1
Many of the technical issues one faces when using 'vanilla' Java can be overcome, and the same goes for most languages/environments. However as is stated numerous times above, the technical issues are only part of what makes a platform great to develop with. It goes all the way to the community and individuals who make use of it.
(Jan 04 '12 at 21:39)
Kyzyl Herzog
1
Scala is another very nice language that also runs on the JVM. It's has all the functional-programming bells and whistles and I think it may play nicer with vanilla Java than Clojure does. But the biggest issue I still see is trying to use anything outside the JVM world.
(Jan 05 '12 at 10:52)
nop
|
Forgot to mention, Lua seems common too!
Yes, if someone could mention something about Lua as well that would really be great