如何使用动态名称计算R数据框中的多个新列

编程入门 行业动态 更新时间:2024-10-24 10:26:31
本文介绍了如何使用动态名称计算R数据框中的多个新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在尝试在R数据框中生成多个新列/变量,这些新列/变量具有取自矢量的动态新名称。新变量是根据单个列的组/级别计算的。 数据框包含沿深度( z )的不同化学元素( element )的量度( counts )。通过将某个深度的每个元素的计数除以相同深度的代理元素(代理)的相应计数来计算新变量。

I'm trying to generate multiple new columns/variables in a R dataframe with dynamic new names taken from a vector. The new variables are computed from groups/levels of a single column. The dataframe contains measurements (counts) of different chemical elements (element) along depth (z). The new variables are computed by diving the counts of each element at a certain depth by the respective counts of proxy elements (proxies) at the same depth.

已经有一种使用mutate的解决方案,如果我只想创建一个新列/显式命名列(请参见下面的代码),则该方法有效。我正在寻找一种通用的解决方案,用于在光泽的Web应用程序中,其中代理不是字符串,而是字符串的向量,并且会根据用户输入动态变化。

There is already a solution using mutate that works if I only want to create one new column/name the columns explicitly (see code below). I'm looking for a generalised solution to use in a shiny web app where proxies is not a string but a vector of strings and is dynamically changing according to user input.

# Working code for just one new column at a time (here Ti_ratio) proxies <- "Ti" df <- tibble(z = rep(1:10, 4), element = rep(c("Ag", "Fe", "Ca", "Ti"), each = 10), counts = rnorm(40)) df_Ti <- df %>% group_by(z) %>% mutate(Ti_ratio = counts/counts[element %in% proxies])

# Not working code for multiple columns at a time proxies <- c("Ca", "Fe", "Ti") varname <- paste(proxies, "ratio", sep = "_") df_ratios <- df %>% group_by(z) %>% map(~ mutate(!!varname = .x$counts/.x$counts[element %in% proxies]))

输出工作代码:

> head(df_Ti) # A tibble: 6 x 4 # Groups: z [6] z element counts Ti_ratio <int> <chr> <dbl> <dbl> 1 1 Ag 2.41 4.10 2 2 Ag -1.06 -0.970 3 3 Ag -0.312 -0.458 4 4 Ag -0.186 0.570 5 5 Ag 1.12 -1.38 6 6 Ag -1.68 -2.84

无效代码的预期输出:

> head(df_ratios) # A tibble: 6 x 6 # Groups: z [6] z element counts Ca_ratio Fe_ratio Ti_ratio <int> <chr> <dbl> <dbl> <dbl> <dbl> 1 1 Ag 2.41 4.78 -10.1 4.10 2 2 Ag -1.06 3.19 0.506 -0.970 3 3 Ag -0.312 -0.479 -0.621 -0.458 4 4 Ag -0.186 -0.296 -0.145 0.570 5 5 Ag 1.12 0.353 3.19 -1.38 6 6 Ag -1.68 -2.81 -0.927 -2.84

编辑: 我找到了 base问题的一般解决方案R 使用两个嵌套的for循环,类似于@fra发布的答案(不同之处在于,这里我遍历深度和代理):

I found a general solution to my problem with base R using two nested for-loops, similar to the answer posted by @fra (the difference being that here I loop both over the depth and the proxies):

library(tidyverse) df <- tibble(z = rep(1:3, 4), element = rep(c("Ag", "Ca", "Fe", "Ti"), each = 3), counts = runif(12)) %>% arrange(z, element) proxies <- c("Ca", "Fe", "Ti") for (f in seq_along(proxies)) { proxy <- proxies[f] tmp2 <- NULL for (i in unique(df$z)) { tmp <- df[df$z == i,] tmp <- as.data.frame(tmp$counts/tmp$counts[tmp$element %in% proxy]) names(tmp) <- paste(proxy, "ratio", sep = "_") tmp2 <- rbind(tmp2, tmp) } df[, 3 + f] <- tmp2 }

以及正确的输出:

> head(df) # A tibble: 6 x 6 z element counts Ca_ratio Fe_ratio Ti_ratio <int> <chr> <dbl> <dbl> <dbl> <dbl> 1 1 Ag 0.690 0.864 9.21 1.13 2 1 Ca 0.798 1 10.7 1.30 3 1 Fe 0.0749 0.0938 1 0.122 4 1 Ti 0.612 0.767 8.17 1 5 2 Ag 0.687 0.807 3.76 0.730 6 2 Ca 0.851 1 4.66 0.904

我使数据框包含较少的数据,因此可以清楚地看出为什么该解决方案正确(元素本身的比率= 1)。 我仍​​然对可以用于管道的更优雅的解决方案感兴趣。

I made the dataframe contain less data so that it's clearly visible why this solution is correct (Ratios of elements with themselves = 1). I'm still interested in a more elegant solution that I could use with pipes.

推荐答案

A tidyverse 选项可能是创建一个类似于原始代码的函数,然后通过使用 map_dfc 来创建新列。

A tidyverse option could be to create a function, similar to your original code and then pass that through using map_dfc to create new columns.

library(tidyverse) proxies <- c("Ca", "Fe", "Ti") your_func <- function(x){ df %>% group_by(z) %>% mutate(!!paste(x, "ratio", sep = "_") := counts/counts[element %in% !!x]) %>% ungroup() %>% select(!!paste(x, "ratio", sep = "_") ) } df %>% group_modify(~map_dfc(proxies, your_func)) %>% bind_cols(df, .) %>% arrange(z, element) # z element counts Ca_ratio Fe_ratio Ti_ratio # <int> <chr> <dbl> <dbl> <dbl> <dbl> # 1 1 Ag -0.112 -0.733 -0.197 -1.51 # 2 1 Ca 0.153 1 0.269 2.06 # 3 1 Fe 0.570 3.72 1 7.66 # 4 1 Ti 0.0743 0.485 0.130 1 # 5 2 Ag 0.881 0.406 -6.52 -1.49 # 6 2 Ca 2.17 1 -16.1 -3.69 # 7 2 Fe -0.135 -0.0622 1 0.229 # 8 2 Ti -0.590 -0.271 4.37 1 # 9 3 Ag 0.398 0.837 0.166 -0.700 #10 3 Ca 0.476 1 0.198 -0.836 # ... with 30 more rows

更多推荐

如何使用动态名称计算R数据框中的多个新列

本文发布于:2023-07-20 17:46:42,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1169680.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:多个   如何使用   框中   名称   动态

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!