变异多个变量以创建多个新变量

编程入门行业动态更新时间:2024-10-27 06:25:59

本文介绍了变异多个变量以创建多个新变量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

假设我有一个 tibble，我需要在其中获取多个变量并将它们变异为新的多个新变量.

举个例子，这是一个简单的小标题:

tb <- tribble(~x, ~y1, ~y2, ~y3, ~z,1,2,4,6,2,2,1,2,3,3,3,6,4,2,1)

我想从名称以y"开头的每个变量中减去变量 z，并将结果变异为 tb 的新变量.另外，假设我不知道我有多少y"变量.我希望该解决方案非常适合 tidyverse/dplyr 工作流程.

本质上，我不明白如何将多个变量变异为多个新变量.我不确定您是否可以在这种情况下使用 mutate ?我已经尝试过 mutate_if，但我认为我没有正确使用它(并且出现错误):

tb %>% mutate_if(starts_with("y"), funs(.-z))#Error: 没有注册 tidyselect 变量

提前致谢！

解决方案

因为操作的是列名，所以需要使用 mutate_at 而不是 mutate_if列内的值

tb %>% mutate_at(vars(starts_with(y")), funs(. - z))#># 小块:3 x 5#>x y1 y2 y3 z#><dbl><dbl><dbl><dbl><dbl>#>1 1 0 2 4 2#>2 2 -2 -1 0 3#>3 3 5 3 1 1

要创建新列，而不是覆盖现有列，我们可以为 funs

命名

#添加后缀tb %>% mutate_at(vars(starts_with(y")), funs(mod = . - z))#># 小块:3 x 8#>x y1 y2 y3 z y1_mod y2_mod y3_mod#><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>#>1 1 2 4 6 2 0 2 4#>2 2 1 2 3 3 -2 -1 0#>3 3 6 4 2 1 5 3 1# 去除后缀，添加前缀tb%＞%mutate_at(vars(starts_with(y")), funs(mod = . - z)) %>%rename_at(vars(ends_with("_mod")), funs(paste("mod", gsub("_mod", "", .), sep = "_")))#># 小块:3 x 8#>x y1 y2 y3 z mod_y1 mod_y2 mod_y3#><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>#>1 1 2 4 6 2 0 2 4#>2 2 1 2 3 3 -2 -1 0#>3 3 6 4 2 1 5 3 1

编辑:在 dplyr 0.8.0 或更高版本中，funs() 将被弃用(source1 & source2)，需要改用list()

tb %>% mutate_at(vars(starts_with(y")), list(~ . - z))#># 小块:3 x 5#>x y1 y2 y3 z#><dbl><dbl><dbl><dbl><dbl>#>1 1 0 2 4 2#>2 2 -2 -1 0 3#>3 3 5 3 1 1tb %>% mutate_at(vars(starts_with(y")), list(mod = ~ . - z))#># 小块:3 x 8#>x y1 y2 y3 z y1_mod y2_mod y3_mod#><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>#>1 1 2 4 6 2 0 2 4#>2 2 1 2 3 3 -2 -1 0#>3 3 6 4 2 1 5 3 1tb%＞%mutate_at(vars(starts_with("y")), list(mod = ~ . - z)) %>%rename_at(vars(ends_with("_mod")), list(~ paste("mod", gsub("_mod", "", .), sep = "_")))#># 小块:3 x 8#>x y1 y2 y3 z mod_y1 mod_y2 mod_y3#><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>#>1 1 2 4 6 2 0 2 4#>2 2 1 2 3 3 -2 -1 0#>3 3 6 4 2 1 5 3 1

编辑 2:dplyr 1.0.0+ 有 across() 函数进一步简化了这个任务

基本用法

across() 有两个主要参数:

第一个参数 .cols 选择要操作的列.它使用整洁的选择(如 select())，因此您可以通过以下方式选择变量位置、名称和类型.

第二个参数 .fns 是一个函数或要应用的函数列表每列.这也可以是 purrr 风格的公式(或公式列表)像~.x/2.(这个参数是可选的，如果你只是想要，你可以省略它获取底层数据；你会看到该技术用于vignette(rowwise").)

# 控制如何使用 `.names` 参数创建名称# 采用 [glue](glue.tidyverse/) 规范:tb%＞%变异(跨越(starts_with(y")，〜.x - z，.names =mod_{col}"))#># 小块:3 x 8#>x y1 y2 y3 z mod_y1 mod_y2 mod_y3#><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>#>1 1 2 4 6 2 0 2 4#>2 2 1 2 3 3 -2 -1 0#>3 3 6 4 2 1 5 3 1tb%＞%变异(跨越(num_range(前缀 = y"，范围 = 1:3)，~ .x - z，.names = mod_{col}"))#># 小块:3 x 8#>x y1 y2 y3 z mod_y1 mod_y2 mod_y3#><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>#>1 1 2 4 6 2 0 2 4#>2 2 1 2 3 3 -2 -1 0#>3 3 6 4 2 1 5 3 1### 多种功能tb%＞%变异(跨越(c(匹配(x")，包含(z"))，〜max(.x，na.rm = TRUE)，.names =max_{col}")，跨越(c(y1:y3)，〜.x - z，.names =mod_{col}"))#># 小费:3 x 10#>x y1 y2 y3 z max_x max_z mod_y1 mod_y2 mod_y3#><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>#>1 1 2 4 6 2 3 3 0 2 4#>2 2 1 2 3 3 3 3 -2 -1 0#>3 3 6 4 2 1 3 3 5 3 1

由 reprex 包 (v0.2.1) 于 2018 年 10 月 29 日创建

Let's say I have a tibble where I need to take multiple variables and mutate them into new multiple new variables.

As an example, here is a simple tibble:

tb <- tribble( ~x, ~y1, ~y2, ~y3, ~z, 1,2,4,6,2, 2,1,2,3,3, 3,6,4,2,1 )

I want to subtract variable z from every variable with a name starting with "y", and mutate the results as new variables of tb. Also, suppose I don't know how many "y" variables I have. I want the solution to fit nicely within tidyverse / dplyr workflow.

In essence, I don't understand how to mutate multiple variables into multiple new variables. I'm not sure if you can use mutate in this instance? I've tried mutate_if, but I don't think I'm using it right (and I get an error):

tb %>% mutate_if(starts_with("y"), funs(.-z)) #Error: No tidyselect variables were registered

Thanks in advance!

解决方案

Because you are operating on column names, you need to use mutate_at rather than mutate_if which uses the values within columns

tb %>% mutate_at(vars(starts_with("y")), funs(. - z)) #> # A tibble: 3 x 5 #> x y1 y2 y3 z #> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 0 2 4 2 #> 2 2 -2 -1 0 3 #> 3 3 5 3 1 1

To create new columns, instead of overwriting existing ones, we can give name to funs

# add suffix tb %>% mutate_at(vars(starts_with("y")), funs(mod = . - z)) #> # A tibble: 3 x 8 #> x y1 y2 y3 z y1_mod y2_mod y3_mod #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 2 4 6 2 0 2 4 #> 2 2 1 2 3 3 -2 -1 0 #> 3 3 6 4 2 1 5 3 1 # remove suffix, add prefix tb %>% mutate_at(vars(starts_with("y")), funs(mod = . - z)) %>% rename_at(vars(ends_with("_mod")), funs(paste("mod", gsub("_mod", "", .), sep = "_"))) #> # A tibble: 3 x 8 #> x y1 y2 y3 z mod_y1 mod_y2 mod_y3 #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 2 4 6 2 0 2 4 #> 2 2 1 2 3 3 -2 -1 0 #> 3 3 6 4 2 1 5 3 1

Edit: In dplyr 0.8.0 or higher versions, funs() will be deprecated (source1 & source2), need to use list() instead

tb %>% mutate_at(vars(starts_with("y")), list(~ . - z)) #> # A tibble: 3 x 5 #> x y1 y2 y3 z #> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 0 2 4 2 #> 2 2 -2 -1 0 3 #> 3 3 5 3 1 1 tb %>% mutate_at(vars(starts_with("y")), list(mod = ~ . - z)) #> # A tibble: 3 x 8 #> x y1 y2 y3 z y1_mod y2_mod y3_mod #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 2 4 6 2 0 2 4 #> 2 2 1 2 3 3 -2 -1 0 #> 3 3 6 4 2 1 5 3 1 tb %>% mutate_at(vars(starts_with("y")), list(mod = ~ . - z)) %>% rename_at(vars(ends_with("_mod")), list(~ paste("mod", gsub("_mod", "", .), sep = "_"))) #> # A tibble: 3 x 8 #> x y1 y2 y3 z mod_y1 mod_y2 mod_y3 #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 2 4 6 2 0 2 4 #> 2 2 1 2 3 3 -2 -1 0 #> 3 3 6 4 2 1 5 3 1

Edit 2: dplyr 1.0.0+ has across() function which simplifies this task even further

Basic usage

across() has two primary arguments:

The first argument, .cols, selects the columns you want to operate on. It uses tidy selection (like select()) so you can pick variables by position, name, and type.

The second argument, .fns, is a function or list of functions to apply to each column. This can also be a purrr style formula (or list of formulas) like ~ .x / 2. (This argument is optional, and you can omit it if you just want to get the underlying data; you'll see that technique used in vignette("rowwise").)

# Control how the names are created with the `.names` argument which # takes a [glue](glue.tidyverse/) spec: tb %>% mutate( across(starts_with("y"), ~ .x - z, .names = "mod_{col}") ) #> # A tibble: 3 x 8 #> x y1 y2 y3 z mod_y1 mod_y2 mod_y3 #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 2 4 6 2 0 2 4 #> 2 2 1 2 3 3 -2 -1 0 #> 3 3 6 4 2 1 5 3 1 tb %>% mutate( across(num_range(prefix = "y", range = 1:3), ~ .x - z, .names = "mod_{col}") ) #> # A tibble: 3 x 8 #> x y1 y2 y3 z mod_y1 mod_y2 mod_y3 #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 2 4 6 2 0 2 4 #> 2 2 1 2 3 3 -2 -1 0 #> 3 3 6 4 2 1 5 3 1 ### Multiple functions tb %>% mutate( across(c(matches("x"), contains("z")), ~ max(.x, na.rm = TRUE), .names = "max_{col}"), across(c(y1:y3), ~ .x - z, .names = "mod_{col}") ) #> # A tibble: 3 x 10 #> x y1 y2 y3 z max_x max_z mod_y1 mod_y2 mod_y3 #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 2 4 6 2 3 3 0 2 4 #> 2 2 1 2 3 3 3 3 -2 -1 0 #> 3 3 6 4 2 1 3 3 5 3 1

Created on 2018-10-29 by the reprex package (v0.2.1)

更多推荐

变异多个变量以创建多个新变量

本文发布于:2023-10-29 07:41:42，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1539112.html