如何使用字典(使用 R/dplyr)有效地附加数据集?/如何合并“所有具有重复名称的列"?

编程入门行业动态更新时间:2024-10-13 02:16:36

本文介绍了如何使用字典(使用 R/dplyr)有效地附加数据集?/如何合并“所有具有重复名称的列"?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

限时送ChatGPT账号..

我有一系列数据集和一本字典来将这些组合在一起.但我正在努力弄清楚如何实现自动化.

假设这个数据和字典(实际的要长得多，因此我想自动化):

mtcarsA <- mtcars[1:5,] %>% rename(mpgA = mpg, cyl_A = cyl) %>% as_tibble()mtcarsB <- mtcars[6:10,] %>% rename(mpg_B = mpg, B_cyl = cyl) %>% as_tibble()dic <- tibble(true_name = c(mpg_true", cyl_true"),nameA = c(mpgA", cyl_A"),nameB = c("mpg_B", "B_cyl"))

我希望将这些数据集(来自 A 年和 B 年)相互附加，然后将名称更改或合并为true_name"值.

我可以将数据集组合成mtcars_all，然后我尝试用字典重新编码列名，如下

<预><代码>mtcars_all <- bind_rows((mtcarsA, mtcarsB)recode_colname <- 函数(df，tn=dic$true_name，fname){colnames(df) <- dplyr::recode(colnames(df),!!!setNames(as.character(tn), fname))回报(df)}mtcars_all <- mtcars_all %>%recode_colname(fname=dic$nameA)%>%recode_colname(fname=dic$nameB)

但后来我得到重复列.当然，我可以按名称合并这些重复的列中的每一个，但在我的实际情况中会很多，所以我想自动化合并所有具有重复名称的列".

我在这里给出整个问题，因为也许有人也有更好的使用数据字典"的解决方案.

解决方案

您可以创建一个命名向量来替换列名.

图书馆(tidyverse)pmap(dic, ~setNames(..1, paste0(c(..2, ..3), collapse = '|'))) %>%flatten_chr() ->值值# mpgA|mpg_B cyl_A|B_cyl#mpg_true"cyl_true"

并将其应用于数据框列表并将它们组合起来.

list(mtcarsA,mtcarsB)%>%map_df(function(x) x %>% rename_with(~str_replace_all(.x, val)))# mpg_true cyl_true disp hp drat wt qsec vs am gear carb# <dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl># 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2#10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4

I have a series of data sets and a dictionary to bring these together. But I'm struggling to figure out how to automate this.

Suppose this data and dictionary (actual one is much longer, thus I want to automate):

mtcarsA <- mtcars[1:5,] %>% rename(mpgA = mpg, cyl_A = cyl) %>% as_tibble()
mtcarsB <- mtcars[6:10,] %>% rename(mpg_B = mpg, B_cyl = cyl) %>% as_tibble()

dic <- tibble(true_name  = c("mpg_true", "cyl_true"), 
              nameA = c("mpgA", "cyl_A"), 
              nameB = c("mpg_B", "B_cyl")
)

I want these datasets (from years A and B) appended to one another, and then to have the names changed or coalesced to the 'true_name' values.

I can bring the data sets together into mtcars_all, and then I tried recoding the column names with the dictionary as follows


mtcars_all <- bind_rows((mtcarsA, mtcarsB)

recode_colname <- function(df, tn=dic$true_name, fname){
  colnames(df) <-  dplyr::recode(colnames(df),
                                !!!setNames(as.character(tn), fname))
  return(df)
  }

mtcars_all <- mtcars_all %>%
  recode_colname(fname=dic$nameA) %>%
  recode_colname(fname=dic$nameB)

But then I get duplicate columns. Of course I could coalesce each of these duplicate columns by name, but there will be many of these in my real case, so I want to automate 'coalesce all columns with duplicate names'.

I'm giving the entire problem here because perhaps someone also has a better solution for 'using a data dictionary'.

解决方案

You can create a named vector to replace column names.

library(tidyverse)

pmap(dic, ~setNames(..1, paste0(c(..2, ..3), collapse = '|'))) %>%
  flatten_chr() -> val

val
# mpgA|mpg_B cyl_A|B_cyl 
# "mpg_true"  "cyl_true"

And apply it on list of dataframes and combine them.

list(mtcarsA,mtcarsB) %>%
  map_df(function(x) x %>% rename_with(~str_replace_all(.x, val)))

#   mpg_true cyl_true  disp    hp  drat    wt  qsec    vs    am  gear  carb
#      <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1     21          6  160    110  3.9   2.62  16.5     0     1     4     4
# 2     21          6  160    110  3.9   2.88  17.0     0     1     4     4
# 3     22.8        4  108     93  3.85  2.32  18.6     1     1     4     1
# 4     21.4        6  258    110  3.08  3.22  19.4     1     0     3     1
# 5     18.7        8  360    175  3.15  3.44  17.0     0     0     3     2
# 6     18.1        6  225    105  2.76  3.46  20.2     1     0     3     1
# 7     14.3        8  360    245  3.21  3.57  15.8     0     0     3     4
# 8     24.4        4  147.    62  3.69  3.19  20       1     0     4     2
# 9     22.8        4  141.    95  3.92  3.15  22.9     1     0     4     2
#10     19.2        6  168.   123  3.92  3.44  18.3     1     0     4     4

这篇关于如何使用字典(使用 R/dplyr)有效地附加数据集?/如何合并“所有具有重复名称的列"?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

更多推荐

[db:关键词]

本文发布于:2023-04-30 13:24:29，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1394443.html