将内置函数n与summary

编程入门行业动态更新时间:2024-10-12 14:24:49

本文介绍了将内置函数n与summary_if结合使用的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在尝试使用内置n函数在df上使用基本的dplyr :: summarize_if：

I am attempting a basic dplyr::summarize_if on a df with the built-in n function:

###Seems like this should work df %>% summarise_if(is.numeric, funs(n, mean, sd, min, max), na.rm = TRUE) Error in summarise_impl(.data, dots) : `n()` does not take arguments

###Works fine without the n df %>% summarise_if(is.numeric, funs(mean, sd, min, max), na.rm = TRUE) A tibble: 1 x 104

我尝试了 n（）和 n（。）（当然不会期望能工作而不会）。

I've tried n() and n(.) (which of course wouldn't expect to work and don't).

我缺少使用 funs（n ）放在 summarise_if 中？

推荐答案

我认为这不是一次通过两种不同方式进行汇总的操作。您想总结一下（1）行数（也许是每组）；（2）某些列的特定功能。 n（）辅助函数倾向于期望用于 full- data.frame ，而在 funs（...）中标识的函数将一次全部传递给向量。

I don't think it's a single-pass operation to summarize in two different ways. You want to summarize (1) the number of rows (perhaps per-group); and (2) specific functions for certain columns. The n() helper function tends to expect to be employed on a full-data.frame, whereas the functions identified within funs(...) will all be passed a vector at a time.

一种方法是合并/加入所需的内容。由于您没有提供数据，因此我将使用 mtcars 。虽然您没有提到分组，但我猜可能会有分组（尽管它不会使事情复杂化），所以我也要注入分组：

One method would be to merge/join in what you need. Since you didn't provide data, I'll use mtcars. Though you don't mention grouping, I'm guessing that there may be groups (though it doesn't complicate things), so I'll inject that, too:

library(dplyr) counts <- select(mtcars, cyl, mpg, wt) %>% group_by(cyl) %>% count() counts # # A tibble: 3 × 2 # cyl n # <dbl> <int> # 1 4 11 # 2 6 7 # 3 8 14

（ count（）本质上是 summarize（n = n（））的快捷方式。用 select（mtcars，cyl，mpg，wt）％&>％count（cyl）来完成同样容易，但是我希望此答案的分组是明确的。）

(count() is essentially a shortcut for summarize(n = n()). This could have been done with select(mtcars, cyl, mpg, wt) %>% count(cyl) just as easily, but I wanted the grouping to be explicit for this answer.)

others <- select(mtcars, cyl, mpg, wt) %>% group_by(cyl) %>% summarise_if(is.numeric, funs(mean, sd)) others # # A tibble: 3 × 5 # cyl mpg_mean wt_mean mpg_sd wt_sd # <dbl> <dbl> <dbl> <dbl> <dbl> # 1 4 26.66364 2.285727 4.509828 0.5695637 # 2 6 19.74286 3.117143 1.453567 0.3563455 # 3 8 15.10000 3.999214 2.560048 0.7594047 left_join(counts, others, by = "cyl") # # A tibble: 3 × 6 # cyl n mpg_mean wt_mean mpg_sd wt_sd # <dbl> <int> <dbl> <dbl> <dbl> <dbl> # 1 4 11 26.66364 2.285727 4.509828 0.5695637 # 2 6 7 19.74286 3.117143 1.453567 0.3563455 # 3 8 14 15.10000 3.999214 2.560048 0.7594047

当然可以一键完成，而不用创建中间变量 counts 和 others ，但是（1）我认为将它们分解会更具有示范性；（2）有时代码的清晰性要优于紧凑性。可以在 others 管道的末尾添加％>％left_join（counts，by = cyl），但不会造成任何损失。

This could of course be done in one-fell-swoop instead of creating the intermediate variables counts and others, but (1) I thought it would be more demonstrative to break them out; and (2) sometimes clarity in code is preferred to compactness. One could add %>% left_join(counts, by = "cyl") to the end of the others pipeline, though, with no loss of clarity.

更多推荐

将内置函数n与summary

本文发布于:2023-10-28 04:46:40，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1535565.html