|
Has anyone used Bradford's Crane library to launch elastic map-reduce jobs from S3? I've figured out most of what is needed to launch a cluster, but it is unclear to me what the natural process is to get input/output data to/from S3. Is that handled natively by the hadoop infrastructure or do I need to write support for file-push? Basically, what I feel I need is a simple example of how to use the crane library to replicate what I'm already doing with with the default hadoop-ec2 and a clojure-hadoop jar. I'm interested in the additional ability to leave the cluster running so I can run cascalog or other interactive operations on the output data as well as to customize how I use clojure + hadoop on EC2 over time. |