将嵌套的for循环转换为R中的foreach？(Converting nested for loops to foreach in R?)

我编写了一个带有2个for循环嵌套在foreach循环中的函数。我有一个数据帧列表，我试图在结构中循环： [[5]][10,30] 。

但是，我试图用1,000,000个路径运行这个函数（即for (i in 1:1000000) ），显然，性能很糟糕。

我真的很想和foreach循环并行运行。我发现使用任何形式的apply函数与foreach一起使用也不能正常工作。当然，如果有更好的方法可以做到这一点，我也很乐意看到这些：

library(foreach) library(doParallel) # input: matr is a list of 5 matrices cum_returns <- function(matr) { time_horizon <- 30 paths <- 10 asset <- 5 foreach (x = matr) %dopar% { for (i in 1:paths) { x[i,] <- append(x[i,],100,0) for (m in 2:(time_horizon + 1)) { # loop through each row of matrix to apply function x[i,m] <- x[i,m-1] + x[i,m] } } return(x) } }

该函数的目标是以这种格式转换数据帧：

V1 V2 V3 V4 V5 V6 result.4 -0.3937681 0.42309970 -0.2283395 -0.8331735437 0.7874238 -0.1453797 result.9 -1.5680301 0.41994580 -2.1580822 1.6118210199 -1.1626008 1.7275690 result.4.1 -0.5495332 -0.82372187 0.3571042 1.0774779108 -0.7305624 0.6109353 result.9.1 -0.6323561 1.70637893 0.6652303 0.7848319013 -1.0563251 0.8036310 result.4.2 -0.3242765 -0.75415454 0.7407225 -1.7877216475 1.5852460 0.1917951 result.9.2 -0.5348290 -0.05270434 1.5113037 0.8491153876 -2.0715359 -2.0216315 result.4.3 -0.7013342 -0.89451784 -0.2683157 -0.2759993796 0.2709717 1.3437261 result.9.3 1.6187813 -1.53112097 0.6938031 -1.4157996794 -0.6058584 0.4324761 result.4.4 -0.6069532 0.07735158 0.7632158 1.0759685528 -0.3157746 -1.1726851 result.9.4 -0.4945204 1.20070722 -0.1619356 -0.0009728659 -2.0367133 1.4713883

对于此格式，通过在每行的开头附加100，然后在每行中添加以下每个值以在每行的字段中创建累积总和：

V1 V2 V3 V4 V5 V6 result.4 100 99.60623 100.02933 99.80099 98.96782 99.75524 result.9 100 98.43197 98.85192 96.69383 98.30565 97.14305 result.4.1 100 99.45047 98.62674 98.98385 100.06133 99.33076 result.9.1 100 99.36764 101.07402 101.73925 102.52408 101.46776 result.4.2 100 99.67572 98.92157 99.66229 97.87457 99.45982 result.9.2 100 99.46517 99.41247 100.92377 101.77289 99.70135 result.4.3 100 99.29867 98.40415 98.13583 97.85983 98.13080 result.9.3 100 101.61878 100.08766 100.78146 99.36566 98.75981 result.4.4 100 99.39305 99.47040 100.23361 101.30958 100.99381 result.9.4 100 99.50548 100.70619 100.54425 100.54328 98.50657

I have written a function with 2 for loops nested within a foreach loop. I have a list of dataframes that I am trying to loop through in the structure of: [[5]][10,30].

However, I am trying to run this function with 1,000,000 paths (i.e. for (i in 1:1000000)), and obviously, the performance is terrible.

I'd really like to run this in parallel with foreach loops. I have found that using any form of apply functions in conjunction with foreach does not work properly as well. Of course, if there are even better ways to do this, I'd love to see those too:

library(foreach) library(doParallel) # input: matr is a list of 5 matrices cum_returns <- function(matr) { time_horizon <- 30 paths <- 10 asset <- 5 foreach (x = matr) %dopar% { for (i in 1:paths) { x[i,] <- append(x[i,],100,0) for (m in 2:(time_horizon + 1)) { # loop through each row of matrix to apply function x[i,m] <- x[i,m-1] + x[i,m] } } return(x) } }

The goal of the function is to convert dataframes in this format:

V1 V2 V3 V4 V5 V6 result.4 -0.3937681 0.42309970 -0.2283395 -0.8331735437 0.7874238 -0.1453797 result.9 -1.5680301 0.41994580 -2.1580822 1.6118210199 -1.1626008 1.7275690 result.4.1 -0.5495332 -0.82372187 0.3571042 1.0774779108 -0.7305624 0.6109353 result.9.1 -0.6323561 1.70637893 0.6652303 0.7848319013 -1.0563251 0.8036310 result.4.2 -0.3242765 -0.75415454 0.7407225 -1.7877216475 1.5852460 0.1917951 result.9.2 -0.5348290 -0.05270434 1.5113037 0.8491153876 -2.0715359 -2.0216315 result.4.3 -0.7013342 -0.89451784 -0.2683157 -0.2759993796 0.2709717 1.3437261 result.9.3 1.6187813 -1.53112097 0.6938031 -1.4157996794 -0.6058584 0.4324761 result.4.4 -0.6069532 0.07735158 0.7632158 1.0759685528 -0.3157746 -1.1726851 result.9.4 -0.4945204 1.20070722 -0.1619356 -0.0009728659 -2.0367133 1.4713883

To this format, through appending 100 at the start of each row then adding each of the following values in each row to create a cumulative sum in each row's fields:

V1 V2 V3 V4 V5 V6 result.4 100 99.60623 100.02933 99.80099 98.96782 99.75524 result.9 100 98.43197 98.85192 96.69383 98.30565 97.14305 result.4.1 100 99.45047 98.62674 98.98385 100.06133 99.33076 result.9.1 100 99.36764 101.07402 101.73925 102.52408 101.46776 result.4.2 100 99.67572 98.92157 99.66229 97.87457 99.45982 result.9.2 100 99.46517 99.41247 100.92377 101.77289 99.70135 result.4.3 100 99.29867 98.40415 98.13583 97.85983 98.13080 result.9.3 100 101.61878 100.08766 100.78146 99.36566 98.75981 result.4.4 100 99.39305 99.47040 100.23361 101.30958 100.99381 result.9.4 100 99.50548 100.70619 100.54425 100.54328 98.50657

最满意答案

没有必要循环遍历行和列。您可以使用R的能力进行矢量化计算以将整列添加到一起，并通过对cbind（）的单个调用替换重复调用append（）。

foreach (x = matr) %dopar% { x <- cbind(100,x) for (m in 2:(time_horizon + 1)) { # loop through each row of matrix to apply function x[,m] <- x[,m-1] + x[,m] } x }

即使我的计算机上没有使用多个内核，这也非常快，每个矩阵中有1,000,000行。

There's no need to loop over the rows and columns. You can use R's ability to do vectorized calculations to add whole columns together, and replace the repeated calls to append() with a single call to cbind().

foreach (x = matr) %dopar% { x <- cbind(100,x) for (m in 2:(time_horizon + 1)) { # loop through each row of matrix to apply function x[,m] <- x[,m-1] + x[,m] } x }

Even without using multiple cores on my computer this is pretty quick with 1,000,000 rows in each matrix.

更多推荐