我目前手头有一项任务,要在一段时间后终止长期运行的 EMR 集群(基于某些指标).Google Dataproc 在此处列出的称为集群计划删除"中具有此功能:cloud.google/dataproc/docs/concepts/configuring-clusters/scheduled-deletion
I currently have a task at hand to Terminate a long-running EMR cluster after a set period of time (based on some metric). Google Dataproc has this capability in something called "Cluster Scheduled Deletion" Listed here: cloud.google/dataproc/docs/concepts/configuring-clusters/scheduled-deletion
这在 EMR 上是可能的吗?也许使用 Cloudwatch 指标?或者我可以编写一个长时间运行的 jar,它会位于 EMR 主节点上,只轮询纱线以获得一些空闲时间指标,然后在一段时间后关闭集群?
Is this something that is possible on EMR natively? Maybe using Cloudwatch metrics? Or can I write a long running jar which will sit on the EMR Master node and just poll yarn for some idle time metric and then shutdown the cluster after a set period of time?
更多说明.我想要一些功能,其中集群基于空闲时间终止一些 x 时间.例如如果集群已经运行了一段时间,但没有作业运行了 1 小时,并且集群只是坐在那里什么也不做,那么我希望能够终止集群.
For more clarification. I would like some functionality wherein the cluster is terminated based on idle for some x amount of time. e.g. If cluster has been up for a while but not jobs have been run for say 1 hour and the cluster is just sitting there doing nothing, then I'd like the ability to terminate the cluster.
推荐答案最简单的方法将用于 Amazon CloudWatch 的 Amazon EMR 指标和维度.有一个 isIdle 布尔值表示集群不再执行工作".
The easiest method would be used to Amazon EMR Metrics and Dimensions for Amazon CloudWatch. There is an isIdle boolean that "indicates that a cluster is no longer performing work".
您可以创建一个 CloudWatch 警报,如果它在 x 分钟以上为 True,则触发警报.这会向 Amazon SNS 发送一条消息,该消息可以触发 Lambda 函数关闭集群.
You could create a CloudWatch Alarm that says if it is True for more than x minutes, then trigger the alarm. This would send a message to Amazon SNS, which can trigger a Lambda function to shutdown the cluster.
组件:
- Amazon CloudWatch 警报
- 亚马逊 SNS 队列
- AWS Lambda 函数
更新:这显然不合适(见下面的评论).
Update: This apparently isn't suitable (see comments below).
另一种方法是:
- 使用 Amazon CloudWatch Events 每 x 秒安排一次 Lambda 函数
- Lambda 函数 查找具有特定标签 的任何集群,该标签指示等待关闭的时间(例如40 分钟).如果标签不存在,则集群保持不变.
- Lambda 函数查询集群状态(不知何故——可能通过 Hadoop API 调用),然后:
- 如果集群空闲并且没有IdleSince标签,添加一个带有当前时间戳的IdleSince标签
- 如果集群空闲并且自 IdleSince 标记中的时间戳起已超过 x 分钟,则终止集群.
- 如果集群不空闲,移除Idle From标签(如果有的话)
- Use Amazon CloudWatch Events to schedule a Lambda function every x seconds
- The Lambda function looks for any clusters with a particular tag that indicates how long to wait until shutdown (eg 40 minutes). If the tag is not present, the cluster remains untouched.
- The Lambda function queries the cluster state (somehow -- probably via a Hadoop API call), then:
- If the cluster is idle and there is no Idle Since tag, add an Idle Since tag with the current timestamp
- If the cluster is idle and it been more than x minutes since the timestamp in the Idle Since tag, terminate the cluster.
- If the cluster is not idle, remove the Idle Since tag (if present)
更多推荐
如何在一段时间后自动终止 AWS EMR 集群
发布评论