码:
import multiprocessing as mp import time def seq(count): print "runing seq" start_time = time.time() result = [] for i in range(count): result.append(cube(i)) print "seq --- time:{0:.4f}".format(time.time() - start_time) #print "seq --- time:{0:.4f}, result:{1}".format(time.time() - start_time, result) def par(count): print "runing par" start_time = time.time() result = mp.Pool(processes=2).map(cube,range(count)) print "par --- time:{0:.4f}".format(time.time() - start_time) #print "par --- time:{0:.4f}, result:{1}".format(time.time() - start_time, result) def cube(x): return x*x*x count = 4000000 seq(count) par(count)输出:
seq ---时间:1.7011
par ---时间:2.3112
我的mac有一个处理器,两个物理内核,每个物理内核有2个虚拟内核。 所以,我认为它应该可以很好地并行运行以获得一些加速。 但是,从输出中可以看出,并行版本比顺序版本慢。 为什么会这样?
规格:
Code:
import multiprocessing as mp import time def seq(count): print "runing seq" start_time = time.time() result = [] for i in range(count): result.append(cube(i)) print "seq --- time:{0:.4f}".format(time.time() - start_time) #print "seq --- time:{0:.4f}, result:{1}".format(time.time() - start_time, result) def par(count): print "runing par" start_time = time.time() result = mp.Pool(processes=2).map(cube,range(count)) print "par --- time:{0:.4f}".format(time.time() - start_time) #print "par --- time:{0:.4f}, result:{1}".format(time.time() - start_time, result) def cube(x): return x*x*x count = 4000000 seq(count) par(count)Output:
seq --- time:1.7011
par --- time:2.3112
My mac has one processor, two physical cores, and 2 virtual cores for each physical core. So, I figured it should be fine running in parallel to gain some speedup. However, from the output, it showed that parallel version is slower than the sequential version. Why is this the case?
spec:
最满意答案
问题是您并行化的操作并不是非常昂贵,这会抵消multiprocessing的好处。 使用multiprocessing带来一些开销; 启动子进程并将数据从父进程移动到这些子进程需要花费很多时间(特别是与线程解决方案相比)。 如果您在后台进程中执行的实际工作非常少,则在进程之间移动数据的开销实际上最终会大于通过并行化工作节省的时间。
如果在测试代码中添加一个短的time.sleep (并减少运行次数,所以你不会永远等待),你可以更清楚地看到这一点:
import multiprocessing as mp import time def seq(count): print "runing seq" start_time = time.time() result = [] for i in range(count): result.append(cube(i)) print "seq --- time:{0:.4f}".format(time.time() - start_time) #print "seq --- time:{0:.4f}, result:{1}".format(time.time() - start_time, result) def par(count): print "runing par" start_time = time.time() result = mp.Pool(processes=2).map(cube,range(count)) print "par --- time:{0:.4f}".format(time.time() - start_time) #print "par --- time:{0:.4f}, result:{1}".format(time.time() - start_time, result) def cube(x): time.sleep(.01) return x*x*x if __name__ == "__main__": count = 400 seq(count) par(count)输出:
runing seq seq --- time:4.0488 runing par par --- time:2.0408在cube内部花费的额外时间使得并行版本比顺序版本快两倍,这与预期的性能改进是正确的。
The problem is that the operation you're parallelizing isn't very expensive, which negates the benefits of multiprocessing. Using multiprocessing carries some overhead; starting up child processes and moving data from your parent process to those children takes a non-trivial amount of time (especially compared to a threaded solution). If the actual work you're doing in the background processes is very small, the overhead of moving the data between processes can actually end up being greater than the amount of time you save by parallelizing the work.
You can see this more clearly if you add a short time.sleep into your test code (and reduce the number of runs, so you're not waiting around forever):
import multiprocessing as mp import time def seq(count): print "runing seq" start_time = time.time() result = [] for i in range(count): result.append(cube(i)) print "seq --- time:{0:.4f}".format(time.time() - start_time) #print "seq --- time:{0:.4f}, result:{1}".format(time.time() - start_time, result) def par(count): print "runing par" start_time = time.time() result = mp.Pool(processes=2).map(cube,range(count)) print "par --- time:{0:.4f}".format(time.time() - start_time) #print "par --- time:{0:.4f}, result:{1}".format(time.time() - start_time, result) def cube(x): time.sleep(.01) return x*x*x if __name__ == "__main__": count = 400 seq(count) par(count)Output:
runing seq seq --- time:4.0488 runing par par --- time:2.0408The additional time spent inside of cube makes the parallel version twice as fast the sequential one, which is right about the expected performance improvement.
更多推荐
发布评论