按因子多列的多个统计信息(Multiple statistics of multiple columns by factor(s))

编程入门 行业动态 更新时间:2024-10-26 00:19:47
按因子多列的多个统计信息(Multiple statistics of multiple columns by factor(s))

假设我想计算下一个数据帧的“dat_1”到“dat_3”列的平均值,标准偏差和n (非NA值的数量),按因子“fac_1”和“fac_2”分组,例如可以从结果中访问每个统计信息(或函数)的单独数据帧

set.seed(1) df <- data.frame("fac_1" = c(rep("a", 5), rep("b", 4)), "fac_2" = c("x", "x", "y","y", "y", "y", "x", "x", "x"), "dat_1" = c(floor(runif(3, 0, 10)), NA, floor(runif(5, 0, 10))), "dat_2" = floor(runif(9, 10, 20)), "dat_3" = floor(runif(9, 20, 30)))

这可以使用plyr一次实现一个功能

ddply(.data = df, .variables = .(df$fac_1, df$fac_2), .fun = function(x) { colMeans(x[, 3:5], na.rm = T) } ) # mean ddply(.data = df, .variables = .(df$fac_1, df$fac_2), .fun = function(x) { psych::SD(x[, 3:5], na.rm = T) } ) # standrd deviation -- note uses SD from the 'psych' package ddply(.data = df, .variables = .(df$fac_1, df$fac_2), .fun = function(x) { colSums(!is.na(x[, 3:5])) } ) # number of non-NA values

但是当使用多个函数时,这变得很麻烦,特别是当必须改变感兴趣的因子和列时。 我想知道是否有另一种选择(或许是单线)。

聚合工作

aggregate( x = df[, c(3:5)], by = df[, c(1,2)], FUN = function(x) c(n = length( !is.na(x) ), mean = mean(x, na.rm = T), sd = sd(x, na.rm = T) ) )

但“分解”结果(分成每个统计数据的单独数据框)变得尴尬。

最近我遇到了dplyr 。 以下似乎有效

df %>% group_by(fac_1, fac_2) %>% summarise_each(funs(n = length( !is.na(.) ), mean(., na.rm = TRUE), sd(., na.rm = TRUE) )) # using dplyr

但是我希望能够将因子粘贴到group_by() ,而我却找不到这样做的方法。

任何帮助或想法? 谢谢

Suppose I'd like to calculate the mean, standard deviation, and n (number of non-NA values) for columns "dat_1" to "dat_3" of the following dataframe, grouped by the factors "fac_1" and "fac_2", such that separate dataframes for each statistic (or function) can be accessed from the result

set.seed(1) df <- data.frame("fac_1" = c(rep("a", 5), rep("b", 4)), "fac_2" = c("x", "x", "y","y", "y", "y", "x", "x", "x"), "dat_1" = c(floor(runif(3, 0, 10)), NA, floor(runif(5, 0, 10))), "dat_2" = floor(runif(9, 10, 20)), "dat_3" = floor(runif(9, 20, 30)))

This can be achieved one function at a time using plyr, as such

ddply(.data = df, .variables = .(df$fac_1, df$fac_2), .fun = function(x) { colMeans(x[, 3:5], na.rm = T) } ) # mean ddply(.data = df, .variables = .(df$fac_1, df$fac_2), .fun = function(x) { psych::SD(x[, 3:5], na.rm = T) } ) # standrd deviation -- note uses SD from the 'psych' package ddply(.data = df, .variables = .(df$fac_1, df$fac_2), .fun = function(x) { colSums(!is.na(x[, 3:5])) } ) # number of non-NA values

but this becomes cumbersome when using multiple functions, especially when factors and columns of interest must be changed. I'm wondering if there's an alternative (a one-liner, perhaps).

Aggregate works

aggregate( x = df[, c(3:5)], by = df[, c(1,2)], FUN = function(x) c(n = length( !is.na(x) ), mean = mean(x, na.rm = T), sd = sd(x, na.rm = T) ) )

but 'disaggregating' the result (into separate dataframes for each statistic) becomes awkward.

Recently I've come across dplyr. The following seems to work

df %>% group_by(fac_1, fac_2) %>% summarise_each(funs(n = length( !is.na(.) ), mean(., na.rm = TRUE), sd(., na.rm = TRUE) )) # using dplyr

however I'd like to be able to paste factors into group_by(), and I've not found a way to do so.

Any help or ideas? Thanks

最满意答案

将向量或列表传递给dplyr函数可能很棘手(请参阅此插图。 )简而言之,它涉及添加额外的下划线,使用函数的标准求值版本,然后将向量或列表传递给.dots参数。

factorsToSummarise <- c('fac_1', 'fac_2') # extra underscore # | df %>% # v group_by_(.dots = factorsToSummarise) %>% summarise_each(funs(n = length( !is.na(.) ), mean(., na.rm = TRUE), sd(., na.rm = TRUE) )) # using dplyr

Passing vectors or lists to dplyr functions can be tricky (see this vignette.) In short, it involves adding an additional underscore, to use the standard evaluation version of a function, and then passing a vector or list to the .dots argument.

factorsToSummarise <- c('fac_1', 'fac_2') # extra underscore # | df %>% # v group_by_(.dots = factorsToSummarise) %>% summarise_each(funs(n = length( !is.na(.) ), mean(., na.rm = TRUE), sd(., na.rm = TRUE) )) # using dplyr

更多推荐

本文发布于:2023-08-08 00:54:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1466739.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:多个   因子   统计信息   Multiple   factor

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!