执行者和核心人数

编程入门 行业动态 更新时间:2024-10-12 01:21:44
本文介绍了执行者和核心人数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我是Spark的新手,如果我们有2个从属c4.8xlarge节点和1个c4.8x大型主节点,我想知道在Spark作业和AWS中必须使用多少个内核和执行程序.我尝试了不同的组合,但无法理解该概念.

I am new to spark and would like to know how many cores and executors have to be used in a spark job and AWS if we have 2 slave c4.8xlarge nodes and 1 c4.8x large master node. I have tried different combinations but not able to understand the concept.

谢谢.

推荐答案

Cloudera的家伙对此给出了很好的解释

Cloudera guys gave good explanation on that

www.youtube/watch?v=vfiJQ7wg81Y

如果,假设您的节点上有16个内核(我认为这正是您的情况),那么您给1的yarn来管理该节点,那么您将15分配给3,那么每个执行器有5个内核. 另外,您的Java开销为Max(384M,0.07 * spark.executor.memory). 因此,如果每个节点有3个执行程序,那么JVM就有3 * Max(384M,0.07 * spark.executor.memory)开销,其余的可以用于内存容器.

If, let's say you have 16 cores on your node(I think this is exactly your case), then you give 1 for yarn to manage this node, then you devide 15 to 3, so each executor has 5 cores. Also, you have java overhead which is Max(384M, 0.07*spark.executor.memory). So, if you have 3 executors per node, then you have 3*Max(384M, 0.07*spark.executor.memory) overhead for JVMs, the rest can be used for memory containers.

但是,在有许多用户同时工作的集群上,yarn可以将您的spark会话从某些容器中推出,使spark一直通过DAG一直返回,并使所有RDD都返回到当前状态,这很糟糕.这就是为什么您需要减少--num-executors,-executor-memory和--executor-cores的数量,以便提前为其他用户提供一些空间的原因.但这不适用于您是唯一一位用户的AWS.

However, on a cluster with many users working simultaneously, yarn can push your spark session out of some containers, making spark go all the way back through the DAG and bringing all the RDD to the present state, which is bad. That is why you need to make --num-executors, --executor-memory and --executor-cores slightly less to give some space to other users in advance. But this doesn't apply to AWS where you are the only one user.

-执行者内存18Gb应该可以为您工作

--executor-memory 18Gb should work for you btw

有关设置群集参数的更多详细信息 blog.cloudera/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

More details on turning your cluster parameters blog.cloudera/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

更多推荐

执行者和核心人数

本文发布于:2023-11-25 13:08:04,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1629878.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:执行者   人数   核心

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!