|
I am not sure if this question is correct, but I am asking to resolve the doubts I have.
I observed that
This is a very open ended question, I am sure the advices might help me and people who have same doubt. |
|
I think you have some concepts a bit mixed: Hadoop, as far as I know is a useful tool if you want to work in a distributed environment. That is, if you want to run your algorithm in parallel computers. MapReduce as well is a really good tool to work with parallel cores. This said, if you are choosing JAVA for Hadoop, I'll recommend you to think twice, since using Hadoop for ML algorithms (if you are starting from zero) is pretty daunting. In my opinion, Java is not the best tool for ML implementations, since JAVA has a problem with native support for large float numbers. Python, on the other hand, is pretty good with calculations. I would recommend you to work on Python, since is easy to get good implementations quite fast. Matlab is another tool you might use which also offers good results and a low step learning curve. |
|
In my opinion, there is no single right tool for ML/DM. For example, if you know Java you may call the Weka API and also you can add your own classifier/regression model/clusterer/etc to weka. This makes sens if you want to compare various algorithms to your model, by using Weka Experimenter (a great tool for statistical comparisons). Another example would be R statistical package: see DATA MINING Desktop Survival Guide and Togo's book. As @Leon says, Python is widely used, and it is my belief that Python is a better approach to perform rapid prototyping of different ML models. The main problem with Weka is that it gets slow really quickly, its hard to do real world implementations, but really good for small scale problems though.
(Jun 23 '11 at 05:39)
Leon Palafox
@Leon: good to know...
(Jun 23 '11 at 06:35)
Lucian Sasu
|