我正在多节点分布式Dask集群上运行多个并行任务.但是,一旦任务完成,工作人员仍会保留大量内存,并且群集很快就会被填满.
I am running multiple parallel tasks on a multi-node distributed Dask cluster. However, once the tasks are finished, workers still hold large memory and cluster gets filled up soon.
在每个任务和 client.cancel(df)之后,我都尝试过 client.restart(),第一个杀死工人并发送 CancelledError 对于其他麻烦的正在运行的任务,第二个任务并没有多大帮助,因为我们在dask的 map 函数中使用了许多自定义对象和函数.为已知变量添加 del 和 gc.collect()也无济于事.
I have tried client.restart() after every task and client.cancel(df), the first one kills workers and sends CancelledError to other running tasks which is troublesome and second one did not help much because we use a lot of custom objects and functions inside dask's map functions. Adding del for known variables and gc.collect() also doesn't help much.
我确定保留的大部分内存是由于自定义的python函数和使用 client.map(..)调用的对象.
I am sure most of the memory held up is because of custom python functions and objects called with client.map(..).
我的问题是:
如果没有对期货的引用,那么Dask应该删除对您使用它创建的Python对象的所有引用.有关更多信息,请参见 www.youtube/watch?v=MsnzpzFZAoQ 有关如何进行调查的信息.
If there are no references to futures then Dask should delete any referneces to Python objects that you've created with it. See www.youtube/watch?v=MsnzpzFZAoQ for more information on how to investigate this.
如果您的自定义Python代码确实有其自身的内存泄漏,则可以,您可以要求Dask工作者定期重新启动自己.请参见 dask-worker --help 手册页,并查找以-lifetime
If your custom Python code does have some memory leak of its own then yes, you can Ask Dask workers to periodically restart themselves. See the dask-worker --help man page and look for keywords that start with --lifetime
更多推荐
清理Dask工人
发布评论