I'm a Maths student. Sadly, I know little about programming. However, I know a lot of Algebraic Topology, Dynamic systems and homotopy theory.

Anyway, I was wondering but what is the most useful language to learn for machine learning? I picked up a book on Java and I'm working through it. I've noticed that students of CS are taught java first so it would make sense to work there.

The books I've seen on machine learning usually are about statistical learning. However, I really want to bring topological ideas into machine learning, particularly create a program that can tell if some functions are homotopic. I haven't managed to find any work on this and all I can find is something called homotopy type theory.

Sorry if this is too soft. However, I don't know where to start and so even anything will help me. I suppose could talk to a CS lecturer.

asked Feb 21 '12 at 13:57

simplicity's gravatar image

simplicity
16113

edited Feb 21 '12 at 14:00


9 Answers:
11

For ease of prototyping I went with Python (with numpy and scipy libraries). It can be compared to MATLAB, without the hefty cost and agonizing bloat. I also believe it is easier to learn than Java.

Good Python book to get started : http://learnpythonthehardway.org/ - Don't be fooled by the title, it just means that you will learn by getting your hands in the code very early, and thus you will learn faster.

Simple machine learning library : http://scikit-learn.org/stable/

I don't know much about homotopy and topology, however :) Even if there aren't existing libraries, it'll be easy to get started from scratch and you will make quick progress.

This answer is marked "community wiki".

answered Feb 21 '12 at 14:03

levesque's gravatar image

levesque
3653515

wikified Feb 28 '12 at 13:55

Thanks for that. I do know Matlab. I was under the impression that Python was the hardest language to learn. I will start reading the book, someone on here said that most people use python in machine learning journals.

(Feb 21 '12 at 14:14) simplicity

Python is a very easy language to learn and I would say it is the language of choice for machine learning (with R, Matlab and C++ being competitors with quite different features).

I did pure math as you did before doing machine learning and now I am sort of working on and with scikit-learn at the moment.

That being said, I think Python is a bad language to learn if you want to learn about programming. It is just to easy and tolerant ;)

If you want to learn to program, I would stick to C++. Java is probably a good choice, too. I guess it is easier in some sense. One could argue that it is unnecessary to learn C++ and everybody is using Java now. In my opinion, if you understood C++, Java will not shock you to much and C++ can be both pretty abstract and pretty close to the hardware level. But this is not really the point here.

So I think you should really think about what your goal is. Do you want to do machine learning or do you want to learn programming?

Also: do you want to code things that are really fast? If you do Python or Matlab, you have no chance (in most cases) in programming algorithms that are as fast as their C or C++ counterparts. Therefore the heavy lifting is often done in C while the rest is done in Python or Matlab.

By the way, I'm at the IST in Austria now and there are quite a lot of people here working on computational topology and connections between topology and machine learning....

cheers, Andy

This answer is marked "community wiki".

answered Feb 21 '12 at 17:32

Andreas%20Mueller's gravatar image

Andreas Mueller
2686185893

1

I would add that you would probably want to avoid C++ as a first language for machine learning. Python, Matlab, and R are the 3 easiest to get into machine learning with. Additionally, you might want to look at Knime and Weka. They both provide a good platform to try out some of the algorithms without having to implement them yourself and get a feel for them.

(Feb 21 '12 at 23:32) Brian Vandenberg

Regarding performance: it's true that pure Python/numpy (or any other high level language for that matter) is rarely as fast as a C implementation, but libraries like Theano (Python) and Torch7 (Lua) are working hard to change that. They are a better option for me because my knowledge of C is rusty at best, and it is probably my least favourite language to work with. If you don't know C at all they might be worth looking into as well.

(Feb 24 '12 at 07:22) Sander Dieleman

To add a bit on Andreas' excellent answer:

Machine Learning itself may be a difficult subject, specially the math. Since you are a math student I think you can come to terms with the probability theory rather quickly.

I would recommend Python, because if you've never done any programming is quite frustrating to fight the program to fix an implementation issue. You know the intuition is right, but is the implementation's the one which is failing. For example, in C++ you can get multiple problems just by declaring an object in the wrong way, a problem which is in no way related to you algorithm.

People like As Andreas mentioned, Python is quite forgiving, and as a beginner that's perhaps what you want for your first experience programming. You want to write things that run on the first try.

Once you've come to grips with python, and if you want to implement something larger and faster, you can scale up, perhaps to Java or C++ (Even C# might be a good option).

With Python you can learn the basics of For loops, if statements, structures and objects without worrying too much on memory management and code optimization.

answered Feb 22 '12 at 04:17

Leon%20Palafox's gravatar image

Leon Palafox ♦
40857194128

I'd suggest Python too, here's some links to get you going...

Some excellent lectures that use Python.

how to think like a computer scientist - pdf book (goes with the lectures)

Python tutorial - good tutorial

codingbat - basic programming puzzles

Eclipse with pydev for your IDE?

answered Feb 22 '12 at 07:30

amair's gravatar image

amair
2452312

1

To learn scientific computing with Python, there is also http://scipy-lectures.github.com

(Mar 01 '12 at 18:52) Gael Varoquaux

I think Octave is worth to mention there. I use it for machine learning personally. Matlab is great choice but OMG that price. Octave is quite similar to Matlab tho.

answered Feb 22 '12 at 14:18

Micha%C5%82%20Szczygie%C5%82's gravatar image

Michał Szczygieł
314

2

Just a fair warning : some people seem to believe Octave code and Matlab code are compatible. It is not the case.

(Feb 22 '12 at 14:51) levesque

python and java are both strong choices, python because its easy to do moderately complex things, very easy to read, fast to type, and with libraries, can act as a nice matlab replacement. java because of it's strong typing and awesome set of high scalability libraries (python wrappers for hadoop dont prevent you from needing to know java or hadoop) make it great for building complex systems.

however, what i'd like to do is present another choice- javascript. first, it's clearly the champion of visualization (see d3 if you have any doubt), and allows really interactive presentations. second, javascript is literally everywhere. it exists in all web browsers, so you're going to deal with it sooner or later anyway. with the proliferation of node.js and v8, javascript has also become extremely fast, some recent benchmarks show javascript as a competitive choice to specialized numeric libraries. JS is also easy and fun to program, being extremely simple. maybe not as "pretty" as python, but not that bad either.

answered Feb 23 '12 at 19:51

downer's gravatar image

downer
54891720

python and java are both strong choices, python because its easy to do moderately complex things, very easy to read, fast to type, and with libraries, can act as a nice matlab replacement. java because of it's strong typing and awesome set of high scalability libraries (python wrappers for hadoop dont prevent you from needing to know java or hadoop) make it great for building complex systems.

however, what i'd like to do is present another choice- javascript. first, it's clearly the champion of visualization (see d3 if you have any doubt), and allows really interactive presentations. second, javascript is literally everywhere. it exists in all web browsers, so you're going to deal with it sooner or later anyway. with the proliferation of node.js and v8, javascript has also become extremely fast, some recent benchmarks show javascript as a competitive choice to specialized numeric libraries. JS is also easy and fun to program, being extremely simple. maybe not as "pretty" as python, but not that bad either.

answered Feb 23 '12 at 19:51

downer's gravatar image

downer
54891720

@simplicity, everyone answered the best options above... I would just not suggest C++ unless you are willing to take the development of very large apps as a means of living.

Regarding statistical learning, well, yes... If you are not into the field of completely simbolic artificial intelligence, machine learning is large-scale applied statistics.

@simplicity, do you care to explain your ideas of TopologyHomotopy Theory applied to Machine Learning? I'm a math major long distant from my college days, and my eyes popped out when you said that.

answered Feb 27 '12 at 09:23

Lucas%20Gallindo's gravatar image

Lucas Gallindo
1123

Python is the ideal language for ML for all the reason already mentioned: free, portable, mature numeric/scientific libraries and relatively fast.

What had prevented me from changing completely to python was the lack of a strong scientific development environment which R and Matlab both have. But that's really starting to change since I found spyder. It's a matlab/RStudio like IDE for scientific python, complete with variable explorer and IPython support.

answered Mar 01 '12 at 02:08

crdrn's gravatar image

crdrn
402162126

edited Mar 01 '12 at 02:11

The new developments of the IPython web notebook are also pretty stunning.

(Mar 01 '12 at 18:52) Gael Varoquaux
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.