用Python写入共享内存非常慢

编程入门行业动态更新时间:2024-10-26 12:29:47

本文介绍了用Python写入共享内存非常慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我使用python.multiprocessing.sharedctypes.RawArray在多个进程之间共享大型numpy数组.而且我注意到，当此数组很大(> 1或2 Gb)时，初始化变得非常慢，读/写也变得很慢(读/写时间是不可预测的，有时会很快，有时会非常快慢).

I use python.multiprocessing.sharedctypes.RawArray to share large numpy arrays between multiple processes. And I've noticed that when this array is large (> 1 or 2 Gb) it becomes very slow to initialize and also much slower to read/write to (and read/write time is not predictable, sometimes pretty fast, sometimes very very slow).

我制作了一个小的示例脚本，该脚本仅使用一个进程，初始化一个共享数组，并多次写入该数组.并计算执行这些操作的时间.

I've made a small sample script that uses just one process, initialize a shared array and write to it several times. And measures time to do these operations.

import argparse import ctypes import multiprocessing as mp import multiprocessing.sharedctypes as mpsc import numpy as np import time def main(): parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter) parser.add_argument('-c', '--block-count', type=int, default=1, help='Number of blocks to write') parser.add_argument('-w', '--block-width', type=int, default=20000, help='Block width') parser.add_argument('-d', '--block-depth', type=int, default=15000, help='Block depth') args = parser.parse_args() blocks = args.block_count blockwidth = args.block_width depth = args.block_depth start = time.perf_counter() shared_array = mpsc.RawArray(ctypes.c_uint16, blocks*blockwidth*depth) finish = time.perf_counter() print('Init shared array of size {:.2f} Gb: {:.2f} s'.format(blocks*blockwidth*depth*ctypes.sizeof(ctypes.c_uint16)/1024/1024/1024, (finish-start))) numpy_array = np.ctypeslib.as_array(shared_array).reshape(blocks*blockwidth, depth) start = time.perf_counter() for i in range(blocks): begin = time.perf_counter() numpy_array[i*blockwidth:(i+1)*blockwidth, :] = np.ones((blockwidth, depth), dtype=np.uint16) end = time.perf_counter() print('Write = %.2f s' % (end-begin)) finish = time.perf_counter() print('Total time = %.2f s' % (finish-start)) if __name__ == '__main__': main()

运行此代码时，我的PC上显示以下内容:

When I run this code I get the following on my PC:

$ python shared-minimal.py -c 1 Init shared array of size 0.56 Gb: 0.36 s Write = 0.13 s Total time = 0.13 s $ python shared-minimal.py -c 2 Init shared array of size 1.12 Gb: 0.72 s Write = 0.12 s Write = 0.13 s Total time = 0.25 s $ python shared-minimal.py -c 4 Init shared array of size 2.24 Gb: 5.40 s Write = 1.17 s Write = 1.17 s Write = 1.17 s Write = 1.57 s Total time = 5.08 s

在最后一种情况下，当阵列大小大于2 Gb时，初始化时间并不线性依赖于阵列大小，并且将保存大小切片分配给阵列的速度要慢5倍以上.

In the last case, when array size is more than 2 Gb, initialization time is not linearly dependent on array size, and assigning save size slices to the array is more than 5 times slower.

我想知道为什么会这样.我正在使用Python 3.5在Ubuntu 16.04上运行脚本.我还通过使用 iotop 注意到，在初始化并写入阵列时，磁盘写入活动的大小与共享阵列相同，但是我不确定是否创建了真实文件，或者仅在其中-内存操作(我想应该是).通常，在共享阵列较大的情况下，我的系统的响应速度也会变慢.没有交换，用top，ipcs -mu和vmstat检查.

I wonder why that happens. I'm running the script on Ubuntu 16.04 using Python 3.5. I also noticed by using iotop that when initializing and writing to the array there is a disk writing activity with same size as shared array, but I'm not sure if a real file is created or it's only in-memory operation (I suppose it should be). In general my system becomes less responsive as well in case of large shared array. There is no swapping, checked with top, ipcs -mu and vmstat.

推荐答案

经过更多研究，我发现python实际上在/tmp中创建了以pymp-开头的文件夹，尽管使用它们在其中看不到任何文件文件查看器，看起来就像python将/tmp/用于共享内存.清空文件兑现后，性能似乎正在下降.

After more research I've found that python actually creates folders in /tmp which are starting with pymp-, and though no files are visible within them using file viewers, it looks exatly like /tmp/ is used by python for shared memory. Performance seems to be decreasing when file cashes are flushed.

最后可行的解决方案是将/tmp挂载为tmpfs:

The working solution in the end was to mount /tmp as tmpfs:

sudo mount -t tmpfs tmpfs /tmp

并且，如果使用最新的docker，请为docker run命令提供--tmpfs /tmp参数.

And, if using the latest docker, by providing --tmpfs /tmp argument to the docker run command.

完成此操作后，将在RAM中进行读/写操作，并且性能快速且稳定.

After doing this, read/write operations are done in RAM, and performance is fast and stable.

我仍然想知道为什么/tmp用于共享内存，而不是/dev/shm已经被命名为tmpfs并且应该用于共享内存的/dev/shm.

I still wonder why /tmp is used for shared memory, not /dev/shm which is already monted as tmpfs and is supposed to be used for shared memory.

更多推荐

用Python写入共享内存非常慢

本文发布于:2023-10-22 22:34:18，感谢您对本站的认可！