如何确保我的 DataFrame 释放内存?

编程入门行业动态更新时间:2024-10-25 10:28:40

本文介绍了如何确保我的 DataFrame 释放内存?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

限时送ChatGPT账号..

我有一份 Spark/Scala 工作，我在其中执行此操作:

I have a Spark/Scala job in which I do this:

1:计算一个大的DataFrame df1 + cache 到内存中2:使用df1计算dfA3:将原始数据读入df2(再次，它很大)+ cache it 1: Compute a big DataFrame df1 + cache it into memory 2: Use df1 to compute dfA 3: Read raw data into df2 (again, its big) + cache it

在执行 (3) 时，我不再需要 df1.我想确保它的空间得到释放.我在 (1) 处缓存，因为这个 DataFrame 在 (2) 中被使用，这是确保我不会每次都重新计算它而只重新计算一次的唯一方法.

When performing (3), I do no longer need df1. I want to make sure its space gets freed. I cached at (1) because this DataFrame gets used in (2) and its the only way to make sure I do not recompute it each time but only once.

我需要释放它的空间并确保它被释放.我有哪些选择?

I need to free its space and make sure it gets freed. What are my options?

我想到了这些，但似乎还不够:

I thought of these, but it doesn't seem to be sufficient:

df=nulldf.unpersist()

你能用正确的 Spark 文档链接记录你的答案吗?

Can you document your answer with a proper Spark documentation link?