我正在尝试使用内置n函数在df上使用基本的dplyr :: summarize_if:
I am attempting a basic dplyr::summarize_if on a df with the built-in n function:
###Seems like this should work df %>% summarise_if(is.numeric, funs(n, mean, sd, min, max), na.rm = TRUE) Error in summarise_impl(.data, dots) : `n()` does not take arguments###Works fine without the n df %>% summarise_if(is.numeric, funs(mean, sd, min, max), na.rm = TRUE) A tibble: 1 x 104
我尝试了 n()和 n(。)(当然不会期望能工作而不会)。
I've tried n() and n(.) (which of course wouldn't expect to work and don't).
我缺少使用 funs(n )放在 summarise_if 中?
推荐答案我认为这不是一次通过两种不同方式进行汇总的操作。您想总结一下(1)行数(也许是每组); (2)某些列的特定功能。 n()辅助函数倾向于期望用于 full- data.frame ,而在 funs(...)中标识的函数将一次全部传递给向量。
I don't think it's a single-pass operation to summarize in two different ways. You want to summarize (1) the number of rows (perhaps per-group); and (2) specific functions for certain columns. The n() helper function tends to expect to be employed on a full-data.frame, whereas the functions identified within funs(...) will all be passed a vector at a time.
一种方法是合并/加入所需的内容。由于您没有提供数据,因此我将使用 mtcars 。虽然您没有提到分组,但我猜可能会有分组(尽管它不会使事情复杂化),所以我也要注入分组:
One method would be to merge/join in what you need. Since you didn't provide data, I'll use mtcars. Though you don't mention grouping, I'm guessing that there may be groups (though it doesn't complicate things), so I'll inject that, too:
library(dplyr) counts <- select(mtcars, cyl, mpg, wt) %>% group_by(cyl) %>% count() counts # # A tibble: 3 × 2 # cyl n # <dbl> <int> # 1 4 11 # 2 6 7 # 3 8 14( count()本质上是 summarize(n = n())的快捷方式。用 select(mtcars,cyl,mpg,wt)%&>%count(cyl)来完成同样容易,但是我希望此答案的分组是明确的。 )
(count() is essentially a shortcut for summarize(n = n()). This could have been done with select(mtcars, cyl, mpg, wt) %>% count(cyl) just as easily, but I wanted the grouping to be explicit for this answer.)
others <- select(mtcars, cyl, mpg, wt) %>% group_by(cyl) %>% summarise_if(is.numeric, funs(mean, sd)) others # # A tibble: 3 × 5 # cyl mpg_mean wt_mean mpg_sd wt_sd # <dbl> <dbl> <dbl> <dbl> <dbl> # 1 4 26.66364 2.285727 4.509828 0.5695637 # 2 6 19.74286 3.117143 1.453567 0.3563455 # 3 8 15.10000 3.999214 2.560048 0.7594047 left_join(counts, others, by = "cyl") # # A tibble: 3 × 6 # cyl n mpg_mean wt_mean mpg_sd wt_sd # <dbl> <int> <dbl> <dbl> <dbl> <dbl> # 1 4 11 26.66364 2.285727 4.509828 0.5695637 # 2 6 7 19.74286 3.117143 1.453567 0.3563455 # 3 8 14 15.10000 3.999214 2.560048 0.7594047当然可以一键完成,而不用创建中间变量 counts 和 others ,但是(1)我认为将它们分解会更具有示范性; (2)有时代码的清晰性要优于紧凑性。可以在 others 管道的末尾添加%>%left_join(counts,by = cyl) ,但不会造成任何损失。
This could of course be done in one-fell-swoop instead of creating the intermediate variables counts and others, but (1) I thought it would be more demonstrative to break them out; and (2) sometimes clarity in code is preferred to compactness. One could add %>% left_join(counts, by = "cyl") to the end of the others pipeline, though, with no loss of clarity.
更多推荐
将内置函数n与summary
发布评论