我们如何获得用于火花作业的整体内存.我无法获取我们可以引用的确切参数来检索相同的参数.已经提到了Spark UI,但不确定我们可以参考的领域.同样在Ganglia,我们提供以下选项: a)内存缓冲区 b)高速缓存 c)可用内存 d)共享内存 e)免费交换空间
How can we get the overall memory used for a spark job. I am not able to get the exact parameter which we can refer to retrieve the same. Have referred to Spark UI but not sure of the field which we can refer. Also in Ganglia we have the following options: a)Memory Buffer b)Cache Memory c)Free Memory d)Shared Memory e)Free Swap Space
无法获得与已用内存"相关的任何选项.有人对此有想法吗?
Not able to get any option related to Memory Used. Does anyone have some idea regarding this.
推荐答案如果您保留RDD,则可以通过UI看到它们在内存中的大小.
If you persist your RDDs you can see how big they are in memory via the UI.
很难了解用于中间任务(例如随机播放)的内存量.基本上,Spark将在可用的情况下根据需要使用尽可能多的内存.这意味着,如果您的RDD占用了超过50%的可用资源,则由于可用于执行的资源较少,应用程序可能会变慢.
It's hard to get an idea of how much memory is being used for intermediate tasks (e.g. for shuffles). Basically Spark will use as much memory as it needs given what's available. This means that if your RDDs take up more than 50% of your available resources, your application might slow down because there are fewer resources available for execution.
更多推荐
监视Spark作业的内存使用情况
发布评论