我想使用 dplyr 参数化以下计算,它可以找出哪些值 Sepal.Length 与更多关联超过一个值 Sepal.Width :
I want to parameterise the following computation using dplyr that finds which values of Sepal.Length are associated with more than one value of Sepal.Width:
library(dplyr) iris %>% group_by(Sepal.Length) %>% summarise(n.uniq=n_distinct(Sepal.Width)) %>% filter(n.uniq > 1)通常我会写这样的东西:
Normally I would write something like this:
not.uniq.per.group <- function(data, group.var, uniq.var) { iris %>% group_by(group.var) %>% summarise(n.uniq=n_distinct(uniq.var)) %>% filter(n.uniq > 1) }但是,这种方法抛出错误,因为 dplyr 使用非标准评估。这个函数应该如何写?
However, this approach throws errors because dplyr uses non-standard evaluation. How should this function be written?
推荐答案你需要使用标准的评估版本的 dplyr 函数(只是附加'_'到函数名,即。 group_by _ & 总结_ )并将字符串传递给你的函数,然后你需要变成符号。要参数summaryise的参数,您需要使用 lazyeval 包中定义的 interp()。具体地:
You need to use the standard evaluation versions of the dplyr functions (just append '_' to the function names, ie. group_by_ & summarise_) and pass strings to your function, which you then need to turn into symbols. To parameterise the argument of summarise_, you will need to use interp(), which is defined in the lazyeval package. Concretely:
library(dplyr) library(lazyeval) not.uniq.per.group <- function(df, grp.var, uniq.var) { df %>% group_by_(grp.var) %>% summarise_( n_uniq=interp(~n_distinct(v), v=as.name(uniq.var)) ) %>% filter(n_uniq > 1) } not.uniq.per.group(iris, "Sepal.Length", "Sepal.Width")dplyr 小插曲为非更多细节的标准评估。
See the dplyr vignette for non standard evaluation for more details.
更多推荐
传递参数给dplyr函数
发布评论