在计算其他汇总统计量的同时使用 n()

编程入门行业动态更新时间:2024-10-09 16:22:58

本文介绍了在计算其他汇总统计量的同时使用 n()的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我无法根据以下数据集使用 dplyr 准备汇总表:

I am having trouble to prepare a summary table using dplyr based on the data set below:

set.seed(1) df <- data.frame(rep(sample(c(2012,2016),10, replace = T)), sample(c('Treat','Control'),10,replace = T), runif(10,0,1), runif(10,0,1), runif(10,0,1)) colnames(df) <- c('Year','Group','V1','V2','V3')

我想计算Year和Group的每种组合的平均值、中位数、标准差并计算观察次数.

I want to calculate the mean, median, standard deviation and count the number of observations by each combination of Year and Group.

我已成功使用此代码获得mean、median 和sd:

I have successfully used this code to get mean, median and sd:

summary.table = df %>% group_by(Year, Group) %>% summarise_all(funs(n(), sd, median, mean))

但是，我不知道如何在funs()命令中引入n()函数.它给了我 V1、V2 和 V3 的计数.这是非常多余的，因为我只想要样本的大小.我试过介绍

However, I do not know how to introduce the n() function inside the funs() command. It gave me the counting for V1, V2 and V3. This is quite redundant, since I only want the size of the sample. I have tried introducing

mutate(N = n()) %>%

在 group_by() 行之前和之后，但它没有给我想要的.

before and after the group_by() line, but it did not give me what I wanted.

有什么帮助吗?

我的怀疑还不够清楚.问题是代码给了我不需要的列，因为 V1 的观察数量对我来说已经足够了.

I had not made my doubt clear enough. The problem is that the code gives me columns that I do not need, since the number of observations for V1 is sufficient for me.

推荐答案

在汇总为额外分组列之前添加 N 列:

Add the N column before summarizing as an extra grouping column:

library(dplyr) set.seed(1) df <- data.frame(Year = rep(sample(c(2012, 2016), 10, replace = TRUE)), Group = sample(c('Treat', 'Control'), 10, replace = TRUE), V1 = runif(10, 0, 1), V2 = runif(10, 0, 1), V3 = runif(10, 0, 1)) df2 <- df %>% group_by(Year, Group) %>% group_by(N = n(), add = TRUE) %>% summarise_all(funs(sd, median, mean)) df2 #> # A tibble: 4 x 12 #> # Groups: Year, Group [?] #> Year Group N V1_sd V2_sd V3_sd V1_median V2_median #> <dbl> <fctr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 2012 Control 2 0.05170954 0.29422635 0.1152669 0.3037848 0.6193239 #> 2 2012 Treat 2 0.51092899 0.08307494 0.1229560 0.5734239 0.5408230 #> 3 2016 Control 3 0.32043716 0.34402222 0.3822026 0.3823880 0.4935413 #> 4 2016 Treat 3 0.37759667 0.29566739 0.1233162 0.3861141 0.6684667 #> # ... with 4 more variables: V3_median <dbl>, V1_mean <dbl>, #> # V2_mean <dbl>, V3_mean <dbl>

更多推荐

在计算其他汇总统计量的同时使用 n()

本文发布于:2023-11-28 20:00:55，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1643770.html