问题描述
限时送ChatGPT账号..我有一份 Spark/Scala 工作,我在其中执行此操作:
I have a Spark/Scala job in which I do this:
1:计算一个大的DataFramedf1
+ cache
到内存中2:使用df1
计算dfA
3:将原始数据读入df2
(再次,它很大)+ cache
it
1: Compute a big DataFrame df1
+ cache
it into memory
2: Use df1
to compute dfA
3: Read raw data into df2
(again, its big) + cache
it
在执行 (3) 时,我不再需要 df1
.我想确保它的空间得到释放.我在 (1) 处缓存
,因为这个 DataFrame 在 (2) 中被使用,这是确保我不会每次都重新计算它而只重新计算一次的唯一方法.
When performing (3), I do no longer need df1
. I want to make sure its space gets freed. I cached
at (1) because this DataFrame gets used in (2) and its the only way to make sure I do not recompute it each time but only once.
我需要释放它的空间并确保它被释放.我有哪些选择?
I need to free its space and make sure it gets freed. What are my options?
我想到了这些,但似乎还不够:
I thought of these, but it doesn't seem to be sufficient:
df=null
df.unpersist()
你能用正确的 Spark 文档链接记录你的答案吗?
Can you document your answer with a proper Spark documentation link?
推荐答案
df.unpersist
应该就足够了,但它不一定会立即释放它.它只是标记要删除的数据帧.
df.unpersist
should be sufficient, but it won't necessarily free it right away. It merely marks the dataframe for removal.
您可以使用 df.unpersist(blocking = true)
它将阻塞直到数据帧被删除,然后再继续.
You can use df.unpersist(blocking = true)
which will block until the dataframe is removed before continuing on.
这篇关于如何确保我的 DataFrame 释放内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论