|
To follow up on the former question. I'd like to install Hadoop on my office machine and play with it. It is a relatively old machine and you can read about it's specs in this question that I posted over OR-exchange. What is the easiest way to get exposed to Hadoop and play with it? Can I install it an Ubuntu machine (single CPU with 2 cores)? or do I need a cluster or EC2 machines? |
|
The simplest possible way is to run Hadoop in standalone mode, which requires no configuration at all. This is an extended version of the script in the Hadoop quick start tutorial, only assuming that Java is available:
Set To run your own MapReduce jobs, simply substitute the examples JAR with your own MapReduce JAR file. (Note that the output directory will not be overwritten, so old results must be moved or deleted.) Once this works for you, try running the same job in pseudo-distributed mode and only after that in fully distributed mode. |
|
Yes you can setup on a single machine,
you can use Cloudera distributed VM if you are using windows http://www.cloudera.com/developers/downloads/virtual-machine/ You can easily work with hadoop on even single core machine, for multi-core you can use pseudo distributed mode so that it uses each core as different machine. You can develop test and execute without any issues. You require EC2 or a cluster only when you actually want scalability and have a large dataset. This is also a good resource: http://www.umiacs.umd.edu/~jimmylin/Cloud9/docs/index.html
(Jul 05 '10 at 22:53)
DirectedGraph
|
|
If you're starting from scratch & using the Apache distro (recommended), the following tutorial should do it: http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster) |