我正在尝试编写一个将转换此数据帧的函数
I am trying to write a function that will convert this data frame
library(dplyr) library(rlang) library(purrr) df <- data.frame(obj=c(1,1,2,2,3,3,3,4,4,4), S1=rep(c("a","b"),length.out=10),PR1=rep(c(3,7),length.out=10), S2=rep(c("c","d"),length.out=10),PR2=rep(c(7,3),length.out=10)) obj S1 PR1 S2 PR2 1 1 a 3 c 7 2 1 b 7 d 3 3 2 a 3 c 7 4 2 b 7 d 3 5 3 a 3 c 7 6 3 b 7 d 3 7 3 a 3 c 7 8 4 b 7 d 3 9 4 a 3 c 7 10 4 b 7 d 3进入此数据框
df %>% {bind_rows(select(., obj, S = S1, PR = PR1), select(., obj, S = S2, PR = PR2))} obj S PR 1 1 a 3 2 1 b 7 3 2 a 3 4 2 b 7 5 3 a 3 6 3 b 7 7 3 a 3 8 4 b 7 9 4 a 3 10 4 b 7 11 1 c 7 12 1 d 3 13 2 c 7 14 2 d 3 15 3 c 7 16 3 d 3 17 3 c 7 18 4 d 3 19 4 c 7 20 4 d 3但是我希望该函数能够使用任意数量的列.因此,如果我有S1,S2,S3,S4或还有其他类别(即DS1,DS2),它也将起作用.理想情况下,该函数将采用以下模式作为参数:确定哪些列彼此堆叠,每个列的集合数,输出列的名称以及也应保留的任何变量的名称.
But I would like the function to be able to work with any number of columns. So it would also work if I had S1, S2, S3, S4 or if there was an additional category ie DS1, DS2. Ideally the function would take as arguments the patterns that determine which columns are stacked on top of each other, the number of sets of each column, the names of the output columns and the names of any variables that should also be kept.
这是我尝试的此功能:
stack_col <- function(df, patterns, nums, cnames, keep){ keep <- enquo(keep) build_exp <- function(x){ paste0("!!sym(cnames[[", x, "]]) := paste0(patterns[[", x, "]],num)") %>% parse_expr() } exps <- map(1:length(patterns), ~expr(!!build_exp(.))) sel_fun <- function(num){ df %>% select(!!keep, !!!exps) } map(nums, sel_fun) %>% bind_rows() }我可以让sel_fun部分用于固定数量的模式,例如
I can get the sel_fun part to work for a fixed number of patterns like this
patterns <- c("S", "PR") cnames <- c("Species", "PR") keep <- quo(obj) sel_fun <- function(num){ df %>% select(!!keep, !!sym(cnames[[1]]) := paste0(patterns[[1]], num), !!sym(cnames[[2]]) := paste0(patterns[[2]], num)) } sel_fun(1)但是我尝试过的动态版本无法正常工作并出现此错误:
But the dynamic version that I have tried does not work and gives this error:
Error: `:=` can only be used within a quasiquoted argument推荐答案
此处是获取预期输出的函数.使用map2,gather将'patterns'和相应的新列名('cnames')循环为'long'格式,rename将'val'列与传递给函数的'cnames'绑定,列(bind_cols)和select感兴趣的列
Here is a function to get the expected output. Loop through the 'patterns' and the corresponding new column names ('cnames') using map2, gather into 'long' format, rename the 'val' column to the 'cnames' passed into the function, bind the columns (bind_cols) and select the columns of interest
stack_col <- function(dat, pat, cname, keep) { purrr::map2(pat, cname, ~ dat %>% dplyr::select(keep, matches(.x)) %>% tidyr::gather(key, val, matches(.x)) %>% dplyr::select(-key) %>% dplyr::rename(!! .y := val)) %>% dplyr::bind_cols(.) %>% dplyr::select(keep, cname) } stack_col(df, patterns, cnames, 1) # obj Species PR #1 1 a 3 #2 1 b 7 #3 2 a 3 #4 2 b 7 #5 3 a 3 #6 3 b 7 #7 3 a 3 #8 4 b 7 #9 4 a 3 #10 4 b 7 #11 1 c 7 #12 1 d 3 #13 2 c 7 #14 2 d 3 #15 3 c 7 #16 3 d 3 #17 3 c 7 #18 4 d 3 #19 4 c 7 #20 4 d 3
此外,可以使用data.table::melt
library(data.table) melt(setDT(df), measure = patterns("^S\\d+", "^PR\\d+"), value.name = c("Species", "PR"))[, variable := NULL][]更多推荐
功能中的动态选择表达式
发布评论