我正在使用multicore包中的mclapply(在Ubuntu上),并且我正在编写一个函数,要求按顺序返回mclapply(x, f)的结果(即f(x[1]), f(x[2]), ...., f(x[n]))
I'm working with mclapply from the multicore package (on Ubuntu), and I'm writing a function that required that the results of mclapply(x, f) are returned in order (that is, f(x[1]), f(x[2]), ...., f(x[n])).
# multicore doesn't work on Windows require(multicore) unlist(mclapply( 1:10, function(x){ Sys.sleep(sample(1:5, size = 1)) identity(x)}, mc.cores = 2)) [1] 1 2 3 4 5 6 7 8 9 10上面的代码似乎暗示mclapply返回结果的顺序与lapply相同.
The above code seems to imply that mclapply returns results in the same order as lapply.
但是,如果这个假设是错误的,我将不得不花费很长的时间来重构我的代码,所以我希望从更熟悉这种封装/并行计算的人那里得到保证,这个假设是正确的.
However, if this assumption is wrong I'll have to spend a long time refactoring my code, so I'm hoping to get assurance from someone more familiar with this package/parallel computing that this assumption is correct.
可以安全地假设mclapply始终按顺序返回其结果,而不管给出的可选参数是什么?
Is it safe to assume that mclapply always returns its results in order, regardless of the optional arguments it is given?
推荐答案简短的答案:确实以正确的顺序返回了结果.
Short answer: it does return the results in the correct order.
但是,当然,您应该自己阅读代码(mclapply是R函数...)
But of course, you should read the code yourself (mclapply is an R function...)
collect的手册页提供了更多提示:
The man page for collect gives some more hints:
注意:如果expr使用低级多核功能(例如sendMaster),则单个作业可以多次传递结果,并且用户有责任正确解释它们.
Note: If expr uses low-level multicore functions such as sendMaster a single job can deliver results multiple times and it is the responsibility of the user to interpret them correctly.
但是,如果您不喜欢低级,
However, if you don't mess with low-level,
collect返回列表中可用的所有结果. 结果将与指定的作业具有相同的顺序.如果有多个作业并且一个作业具有名称,则将使用该作业来命名结果,否则将使用其进程ID.
collect returns any results that are available in a list. The results will have the same order as the specified jobs. If there are multiple jobs and a job has a name it will be used to name the result, otherwise its process ID will be used.
(我的重点)
现在为mclapply. 快速浏览源代码会得出:
Now for mclapply. A quick glanc over the source code yields:
- 如果!mc.preschedule并且所使用的作业不超过内核(length (X) <= cores),parallel和collect,请参见上文.
- 如果mc.preschedule个或更多的作业比内核更多,则mclapply本身会处理订单-请参见代码.
- if !mc.preschedule and there are no more jobs than cores (length (X) <= cores) parallel and collect are used, see above.
- if mc.preschedule or more jobs than cores, mclapply itself takes care of the order - see the code.
不过,这是您的实验的稍作修改的版本:
However, here's a slightly modified version of your experiment:
> unlist (mclapply(1:10, function(x){ Sys.sleep(sample(1:5, size = 1)); cat (x, " "); identity(x)}, mc.cores = 2, mc.preschedule = FALSE)) 1 2 4 3 6 5 7 8 9 10 [1] 1 2 3 4 5 6 7 8 9 10 > unlist (mclapply(1:10, function(x){ Sys.sleep(sample(1:5, size = 1)); cat (x, " "); identity(x)}, mc.cores = 2, mc.preschedule = TRUE)) 1 3 2 5 4 6 7 8 10 9 [1] 1 2 3 4 5 6 7 8 9 10这表明子作业以不同的顺序返回结果(更精确地说:子作业将以不同的顺序完成),但是结果以原始顺序组合.
Which shows that the results are returned in different order by the child jobs (more precisely: child jobs are about to finish in different order), but the result is assembled in the original order.
(可在控制台上使用,但不能在RStudio中使用-cat不会显示在此处)
(works on the console, but not in RStudio - the cats do not show up there)
更多推荐
是否可以保证mclapply按顺序返回其结果?
发布评论