安排Amazon Elastic MapReduce作业的工具/方式

编程入门 行业动态 更新时间:2024-10-26 14:31:28
本文介绍了安排Amazon Elastic MapReduce作业的工具/方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我使用EMR创建新实例并处理作业,然后关闭实例.

I use EMR to create new instances and process the jobs and then shutdown instances.

我的要求是定期安排工作.一种简单的实现方法是使用石英来触发EMR作业.但是,从更长远的角度来看,我对使用开箱即用的mapreduce调度解决方案感兴趣.我的问题是,我可以使用EMR或AWS-SDK提供的任何现成的计划功能吗?我可以看到在自动缩放中有计划,但是我想改为计划EMR工作流程.

My requirement is to schedule jobs in periodic fashion. One of the easy implementation can be to use quartz to trigger EMR jobs. But looking at longer run I am interested in using out of box mapreduce scheduling solution. My question is that is there any out of box scheduling feature provided by EMR or AWS-SDK, which i can use for my requirement? I can see there is scheduling in Auto scaling, but i want to schedule EMR jobflow instead.

推荐答案

有适用于Hadoop的Apache Oozie工作流计划程序为此.

Oozie是用于管理Apache Hadoop作业的工作流调度程序系统.

Oozie is a workflow scheduler system to manage Apache Hadoop jobs.

Oozie Workflow作业是操作的有向无环图(DAG).

Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.

Oozie Coordinator作业是由以下人员触发的周期性Oozie Workflow作业 时间(频率)和数据可用性.

Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty.

Oozie与其他Hadoop堆栈集成在一起,支持 开箱即用的几种类型的Hadoop作业(例如Java map-reduce, 流式Map-Reduce,Pig,Hive,Sqoop和Distcp)以及系统 特定的作业(例如Java程序和Shell脚本).

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

Oozie是一个可扩展,可靠且可扩展的系统.

Oozie is a scalable, reliable and extensible system.

这是用于配置apache oozie的Elastic Map Reduce引导操作的简单示例: github/lila/emr-oozie-sample

Here is a simple example of Elastic Map Reduce bootstrap actions for configuring apache oozie : github/lila/emr-oozie-sample

但是,要让您知道oozie有点复杂,并且仅当您要安排/监视/维护许多工作时,才可以使用oozie,否则只需创建一堆cron如果您只说要定期安排2或3个工作,则该工作.

But to let you know oozie is a bit complicated and if and only if you have a lot of jobs to be scheduled/monitored/maintained then only you shall go for oozie or else just create a bunch of cron jobs if you have say just 2 or 3 jobs to be scheduled periodically.

您还可以研究和探索来自Amazon的简单工作流程.

更多推荐

安排Amazon Elastic MapReduce作业的工具/方式

本文发布于:2023-11-24 08:46:06,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1624523.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:作业   方式   工具   Amazon   Elastic

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!