我正在尝试使用Process对象在python中使用工作池.每个工作程序(一个进程)都进行一些初始化(花费很短的时间),传递一系列的工作(理想情况下使用map()),然后返回一些内容.除此之外,没有必要进行任何沟通.但是,我似乎无法弄清楚如何使用map()使用工人的compute()函数.
I am trying to use a worker Pool in python using Process objects. Each worker (a Process) does some initialization (takes a non-trivial amount of time), gets passed a series of jobs (ideally using map()), and returns something. No communication is necessary beyond that. However, I can't seem to figure out how to use map() to use my worker's compute() function.
from multiprocessing import Pool, Process class Worker(Process): def __init__(self): print 'Worker started' # do some initialization here super(Worker, self).__init__() def compute(self, data): print 'Computing things!' return data * data if __name__ == '__main__': # This works fine worker = Worker() print workerpute(3) # workers get initialized fine pool = Pool(processes = 4, initializer = Worker) data = range(10) # How to use my worker pool? result = pool.map(compute, data)是要代替工作队列吗?还是可以使用map()?
Is a job queue the way to go instead, or can I use map()?
推荐答案我建议您为此使用一个队列.
I would suggest that you use a Queue for this.
class Worker(Process): def __init__(self, queue): super(Worker, self).__init__() self.queue = queue def run(self): print('Worker started') # do some initialization here print('Computing things!') for data in iter(self.queue.get, None): # Use data现在您可以开始一堆了,所有这些工作都从一个队列中完成
Now you can start a pile of these, all getting work from a single queue
request_queue = Queue() for i in range(4): Worker(request_queue).start() for data in the_real_source: request_queue.put(data) # Sentinel objects to allow clean shutdown: 1 per worker. for i in range(4): request_queue.put(None)这种事情应该可以让您分摊多名员工的昂贵启动成本.
That kind of thing should allow you to amortize the expensive startup cost across multiple workers.
更多推荐
带有工作进程的python池
发布评论