我使用EMR创建新实例并处理作业,然后关闭实例.
I use EMR to create new instances and process the jobs and then shutdown instances.
我的要求是定期安排工作.一种简单的实现方法是使用石英来触发EMR作业.但是,从更长远的角度来看,我对使用开箱即用的mapreduce调度解决方案感兴趣.我的问题是,我可以使用EMR或AWS-SDK提供的任何现成的计划功能吗?我可以看到在自动缩放中有计划,但是我想改为计划EMR工作流程.
My requirement is to schedule jobs in periodic fashion. One of the easy implementation can be to use quartz to trigger EMR jobs. But looking at longer run I am interested in using out of box mapreduce scheduling solution. My question is that is there any out of box scheduling feature provided by EMR or AWS-SDK, which i can use for my requirement? I can see there is scheduling in Auto scaling, but i want to schedule EMR jobflow instead.
推荐答案有适用于Hadoop的Apache Oozie工作流计划程序为此.
Oozie是用于管理Apache Hadoop作业的工作流调度程序系统.
Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
Oozie Workflow作业是操作的有向无环图(DAG).
Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.
Oozie Coordinator作业是由以下人员触发的周期性Oozie Workflow作业 时间(频率)和数据可用性.
Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty.
Oozie与其他Hadoop堆栈集成在一起,支持 开箱即用的几种类型的Hadoop作业(例如Java map-reduce, 流式Map-Reduce,Pig,Hive,Sqoop和Distcp)以及系统 特定的作业(例如Java程序和Shell脚本).
Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).
Oozie是一个可扩展,可靠且可扩展的系统.
Oozie is a scalable, reliable and extensible system.
这是用于配置apache oozie的Elastic Map Reduce引导操作的简单示例: github/lila/emr-oozie-sample
Here is a simple example of Elastic Map Reduce bootstrap actions for configuring apache oozie : github/lila/emr-oozie-sample
但是,要让您知道oozie有点复杂,并且仅当您要安排/监视/维护许多工作时,才可以使用oozie,否则只需创建一堆cron如果您只说要定期安排2或3个工作,则该工作.
But to let you know oozie is a bit complicated and if and only if you have a lot of jobs to be scheduled/monitored/maintained then only you shall go for oozie or else just create a bunch of cron jobs if you have say just 2 or 3 jobs to be scheduled periodically.
您还可以研究和探索来自Amazon的简单工作流程.
更多推荐
安排Amazon Elastic MapReduce作业的工具/方式
发布评论