是否可以保证mclapply按顺序返回其结果?

编程入门行业动态更新时间:2024-10-27 00:30:15

本文介绍了是否可以保证mclapply按顺序返回其结果?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在使用multicore包中的mclapply(在Ubuntu上)，并且我正在编写一个函数，要求按顺序返回mclapply(x, f)的结果(即f(x[1]), f(x[2]), ...., f(x[n]))

I'm working with mclapply from the multicore package (on Ubuntu), and I'm writing a function that required that the results of mclapply(x, f) are returned in order (that is, f(x[1]), f(x[2]), ...., f(x[n])).

# multicore doesn't work on Windows require(multicore) unlist(mclapply( 1:10, function(x){ Sys.sleep(sample(1:5, size = 1)) identity(x)}, mc.cores = 2)) [1] 1 2 3 4 5 6 7 8 9 10

上面的代码似乎暗示mclapply返回结果的顺序与lapply相同.

The above code seems to imply that mclapply returns results in the same order as lapply.

但是，如果这个假设是错误的，我将不得不花费很长的时间来重构我的代码，所以我希望从更熟悉这种封装/并行计算的人那里得到保证，这个假设是正确的.

However, if this assumption is wrong I'll have to spend a long time refactoring my code, so I'm hoping to get assurance from someone more familiar with this package/parallel computing that this assumption is correct.

可以安全地假设mclapply始终按顺序返回其结果，而不管给出的可选参数是什么?

Is it safe to assume that mclapply always returns its results in order, regardless of the optional arguments it is given?

推荐答案

简短的答案:确实以正确的顺序返回了结果.

Short answer: it does return the results in the correct order.

但是，当然，您应该自己阅读代码(mclapply是R函数...)

But of course, you should read the code yourself (mclapply is an R function...)

collect的手册页提供了更多提示:

The man page for collect gives some more hints:

注意:如果expr使用低级多核功能(例如sendMaster)，则单个作业可以多次传递结果，并且用户有责任正确解释它们.

Note: If expr uses low-level multicore functions such as sendMaster a single job can deliver results multiple times and it is the responsibility of the user to interpret them correctly.

但是，如果您不喜欢低级，

However, if you don't mess with low-level,

collect返回列表中可用的所有结果. 结果将与指定的作业具有相同的顺序.如果有多个作业并且一个作业具有名称，则将使用该作业来命名结果，否则将使用其进程ID.

collect returns any results that are available in a list. The results will have the same order as the specified jobs. If there are multiple jobs and a job has a name it will be used to name the result, otherwise its process ID will be used.

(我的重点)

现在为mclapply. 快速浏览源代码会得出:

Now for mclapply. A quick glanc over the source code yields:

如果!mc.preschedule并且所使用的作业不超过内核(length (X) <= cores)，parallel和collect，请参见上文.
如果mc.preschedule个或更多的作业比内核更多，则mclapply本身会处理订单-请参见代码.

if !mc.preschedule and there are no more jobs than cores (length (X) <= cores) parallel and collect are used, see above.
if mc.preschedule or more jobs than cores, mclapply itself takes care of the order - see the code.

不过，这是您的实验的稍作修改的版本:

However, here's a slightly modified version of your experiment:

> unlist (mclapply(1:10, function(x){ Sys.sleep(sample(1:5, size = 1)); cat (x, " "); identity(x)}, mc.cores = 2, mc.preschedule = FALSE)) 1 2 4 3 6 5 7 8 9 10 [1] 1 2 3 4 5 6 7 8 9 10 > unlist (mclapply(1:10, function(x){ Sys.sleep(sample(1:5, size = 1)); cat (x, " "); identity(x)}, mc.cores = 2, mc.preschedule = TRUE)) 1 3 2 5 4 6 7 8 10 9 [1] 1 2 3 4 5 6 7 8 9 10

这表明子作业以不同的顺序返回结果(更精确地说:子作业将以不同的顺序完成)，但是结果以原始顺序组合.

Which shows that the results are returned in different order by the child jobs (more precisely: child jobs are about to finish in different order), but the result is assembled in the original order.