从data.frame中的现有变量创建几个新的派生变量(Create several new derived variables from existing variables in data.fra

从data.frame中的现有变量创建几个新的派生变量(Create several new derived variables from existing variables in data.frame)

在RI中有一个data.frame，它有几个已经过几年测量的变量。我想得出每个变量的月平均值（使用所有年份）。理想情况下，这些新变量将全部放在一个新的data.frame（携带ID）中，下面我只是将新变量添加到data.frame中。我现在知道如何做到这一点的唯一方法（下面）似乎非常费力，而且我希望在R中可能有一种更聪明的方法来做到这一点，这不需要像每个月那样输入和变量。

# Example data.frame with only two years, two month, and two variables # In the real data set there are always 12 months per year # and there are at least four variables df<- structure(list(ID = 1:4, ABC.M1Y2001 = c(10, 12.3, 45, 89), ABC.M2Y2001 = c(11.1, 34, 67.7, -15.6), ABC.M1Y2002 = c(-11.1, 9, 34, 56.5), ABC.M2Y2002 = c(12L, 13L, 11L, 21L), DEF.M1Y2001 = c(14L, 14L, 14L, 16L), DEF.M2Y2001 = c(15L, 15L, 15L, 12L), DEF.M1Y2002 = c(5, 12, 23.5, 34), DEF.M2Y2002 = c(6L, 34L, 61L, 56L)), .Names = c("ID", "ABC.M1Y2001", "ABC.M2Y2001","ABC.M1Y2002", "ABC.M2Y2002", "DEF.M1Y2001", "DEF.M2Y2001", "DEF.M1Y2002", "DEF.M2Y2002"), class = "data.frame", row.names = c(NA, -4L)) # list variable to average for ABC Month 1 across years ABC.M1.names <- c("ABC.M1Y2001", "ABC.M1Y2002") df <- transform(df, ABC.M1 = rowMeans(df[,ABC.M1.names], na.rm = TRUE)) # list variable to average for ABC Month 2 across years ABC.M2.names <- c("ABC.M2Y2001", "ABC.M2Y2002") df <- transform(df, ABC.M2 = rowMeans(df[,ABC.M2.names], na.rm = TRUE)) # and so forth for ABC # ... # list variables to average for DEF Month 1 across years DEF.M1.names <- c("DEF.M1Y2001", "DEF.M1Y2002") df <- transform(df, DEF.M1 = rowMeans(df[,DEF.M1.names], na.rm = TRUE)) # and so forth for DEF # ...

In R I have a data.frame that has several variables that have been measured monthly over several years. I would like to derive the monthly average (using all years) for each variable. Ideally these new variables would all be together in a new data.frame (carrying over the ID), below I am simply adding the new variable to the data.frame. The only way I know how to do this at the moment (below) seems quite laborious, and I was hoping there might be a smarter way to do this in R, that would not require typing out each month and variable as I did below.

# Example data.frame with only two years, two month, and two variables # In the real data set there are always 12 months per year # and there are at least four variables df<- structure(list(ID = 1:4, ABC.M1Y2001 = c(10, 12.3, 45, 89), ABC.M2Y2001 = c(11.1, 34, 67.7, -15.6), ABC.M1Y2002 = c(-11.1, 9, 34, 56.5), ABC.M2Y2002 = c(12L, 13L, 11L, 21L), DEF.M1Y2001 = c(14L, 14L, 14L, 16L), DEF.M2Y2001 = c(15L, 15L, 15L, 12L), DEF.M1Y2002 = c(5, 12, 23.5, 34), DEF.M2Y2002 = c(6L, 34L, 61L, 56L)), .Names = c("ID", "ABC.M1Y2001", "ABC.M2Y2001","ABC.M1Y2002", "ABC.M2Y2002", "DEF.M1Y2001", "DEF.M2Y2001", "DEF.M1Y2002", "DEF.M2Y2002"), class = "data.frame", row.names = c(NA, -4L)) # list variable to average for ABC Month 1 across years ABC.M1.names <- c("ABC.M1Y2001", "ABC.M1Y2002") df <- transform(df, ABC.M1 = rowMeans(df[,ABC.M1.names], na.rm = TRUE)) # list variable to average for ABC Month 2 across years ABC.M2.names <- c("ABC.M2Y2001", "ABC.M2Y2002") df <- transform(df, ABC.M2 = rowMeans(df[,ABC.M2.names], na.rm = TRUE)) # and so forth for ABC # ... # list variables to average for DEF Month 1 across years DEF.M1.names <- c("DEF.M1Y2001", "DEF.M1Y2002") df <- transform(df, DEF.M1 = rowMeans(df[,DEF.M1.names], na.rm = TRUE)) # and so forth for DEF # ...

最满意答案

以下是使用reshape2的解决方案，当您拥有大量数据并使用正则表达式提取变量名称和月份时，该解决方案更加自动化。这个解决方案将为您提供一个很好的汇总表。

# Load required package require(reshape2) # Melt your wide data into long format mdf <- melt(df , id = "ID" ) # Extract relevant variable names from the variable colum mdf$Month <- gsub( "^.*\\.(M[0-9]{1,2}).*$" , "\\1" , mdf$variable ) mdf$Var <- gsub( "^(.*)\\..*" , "\\1" , mdf$variable ) # Aggregate by month and variable dcast( mdf , Var ~ Month , mean ) # Var M1 M2 #1 ABC 30.5875 19.275 #2 DEF 16.5625 26.750

或者与其他解决方案兼容，并按ID返回表格......

dcast( mdf , ID ~ Var + Month , mean ) # ID ABC_M1 ABC_M2 DEF_M1 DEF_M2 #1 1 -0.55 11.55 9.50 10.5 #2 2 10.65 23.50 13.00 24.5 #3 3 39.50 39.35 18.75 38.0 #4 4 72.75 2.70 25.00 34.0

Here is a solution using reshape2 that is more automated when you have lots of data and uses regular expressions to extract the variable name and the month. This solution will give you a nice summary table.

# Load required package require(reshape2) # Melt your wide data into long format mdf <- melt(df , id = "ID" ) # Extract relevant variable names from the variable colum mdf$Month <- gsub( "^.*\\.(M[0-9]{1,2}).*$" , "\\1" , mdf$variable ) mdf$Var <- gsub( "^(.*)\\..*" , "\\1" , mdf$variable ) # Aggregate by month and variable dcast( mdf , Var ~ Month , mean ) # Var M1 M2 #1 ABC 30.5875 19.275 #2 DEF 16.5625 26.750

Or to be compatible with the other solutions, and return the table by ID as well...

dcast( mdf , ID ~ Var + Month , mean ) # ID ABC_M1 ABC_M2 DEF_M1 DEF_M2 #1 1 -0.55 11.55 9.50 10.5 #2 2 10.65 23.50 13.00 24.5 #3 3 39.50 39.35 18.75 38.0 #4 4 72.75 2.70 25.00 34.0

更多推荐