Has anyone used Bradford's Crane library to launch elastic map-reduce jobs from S3?

I've figured out most of what is needed to launch a cluster, but it is unclear to me what the natural process is to get input/output data to/from S3. Is that handled natively by the hadoop infrastructure or do I need to write support for file-push?

Basically, what I feel I need is a simple example of how to use the crane library to replicate what I'm already doing with with the default hadoop-ec2 and a clojure-hadoop jar. I'm interested in the additional ability to leave the cluster running so I can run cascalog or other interactive operations on the output data as well as to customize how I use clojure + hadoop on EC2 over time.

asked Sep 26 '10 at 19:49

Ian%20Eslick's gravatar image

Ian Eslick
16112

edited Sep 27 '10 at 14:46

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146


One Answer:

It doesn't precisely follow the crane modle, but you might try out Lemur. It's built specifically for EMR, though the abstraction is pretty good, so you could probably extend it to boot raw EC2 hosts.

answered Jan 28 '13 at 14:52

ndimiduk's gravatar image

ndimiduk
1

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.