将多列折叠/连接/聚合为每个组内的单个逗号分隔字符串

编程入门 行业动态 更新时间:2024-10-25 20:24:51
本文介绍了将多列折叠/连接/聚合为每个组内的单个逗号分隔字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时送ChatGPT账号..

这是发布将一列折叠/连接/聚合为每个组内的单个逗号分隔字符串

目标:根据一个分组变量聚合多列,并通过选择的分隔符分隔各个值.

可重现的例子:

data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = c(rep(c(100), 3), rep(200),3)), C = rep(c(1,2,NA),2), D = c(15:20), E = rep(c(1,NA,NA),2))数据A B C D E1 111 100 1 15 12 111 100 2 16 不适用3 111 100 不适用 17 不适用4 222 200 1 18 15 222 200 2 19 不适用6 222 200 不适用 20 不适用

A 是分组变量,但 B 仍显示在整体结果中(B 取决于我的应用程序中的 A),而 C、D 和 E 是要折叠为分隔的 character 字符串的变量.

期望输出

 A B C D E1 111 100 1,2 15,16,17 12 222 100 1,2 18,19,20 1

我对 R 没有太多经验.我确实尝试扩展 G. Grothendieck 发布到链接帖子的解决方案以满足我的要求,但不能完全适合多个列.

获得所需输出的正确实现是什么?

在我的尝试中,我特别关注 group_bysummarise_allaggregate.它们一团糟,所以我不相信展示它甚至会有所帮助.

发布的解决方案非常适合显示所需的结果!继续为那些发现它的人提高这篇文章的价值.

用户如何选择自己的分隔符.例如'-', '\n'@akrun 和@tmfmnk 的当前解决方案都导致列表而不是连接的 character 字符串.如果我说错了,请纠正我.

data$D[1] 15 16 17 18 19 20>数据$A[1] 111 111 111 222 222 222>数据$B[1] 100 100 100 200 200 200>数据$C[1] 1 2 不适用 1 2 不适用>数据$D[1] 15 16 17 18 19 20>数据$E[1] 1 NA NA 1 NA NA

解决方案

我们可以按'A'、'B'进行分组,并使用summarise_at粘贴所有非NA元素

库(dplyr)数据%>%group_by(A, B) %>%summarise_at(vars(-group_cols()), ~ toString(.[!is.na(.)]))# 小块:2 x 5# 组:A [2]# A B C D E# <dbl><dbl><chr><chr><chr>#1 111 100 1, 2 15, 16, 17 1#2 222 200 1, 2 18, 19, 20 1

如果我们需要传递自定义分隔符,请使用 pastestr_c

库(stringr)数据%>%group_by(A, B) %>%summarise_at(vars(-group_cols()), ~ str_c(.[!is.na(.)], collapse="_"))

<小时>

或者使用 base Raggregate

aggregate(. ~ A + B, data, FUN = function(x)toString(x[!is.na(x)]), na.action = NULL)

This is an extension to post Collapse / concatenate / aggregate a column to a single comma separated string within each group

Goal: aggregate multiple columns according to one grouping variable and separate individual values by separator of choice.

Reproducible example:

data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = c(rep(c(100), 3), rep(200,3)), C = rep(c(1,2,NA),2), D = c(15:20), E = rep(c(1,NA,NA),2))
data
    A   B  C  D  E
1 111 100  1 15  1
2 111 100  2 16 NA
3 111 100 NA 17 NA
4 222 200  1 18  1
5 222 200  2 19 NA
6 222 200 NA 20 NA

A is the grouping variable but B is still displayed in overall result (B depends on A in my application) and C, D and E are the variables to be collapsed into separated character strings.

Desired Output

    A   B  C    D         E
1 111 100  1,2  15,16,17  1
2 222 100  1,2  18,19,20  1    

I don't have a ton of experience with R. I did try to expand upon the solutions posted by G. Grothendieck to the linked post to meet my requirements but can't quite get it right for multiple columns.

What would be a proper implementation to get the desired output?

I focused specifically on group_by and summarise_all and aggregate in my attempts. They are a complete mess so I don't believe it would even be helpful to display.

EDIT: Solutions posted work great at displaying desired result! To continue improving the value in this post for those that find it.

How would it be possible for users to select their own separation characters. e.g. '-', '\n' The current solutions by @akrun and @tmfmnk both result in lists instead of a concatenated character string. Please correct me if I said this incorrectly.

data$D
[1] 15 16 17 18 19 20
> data$A
[1] 111 111 111 222 222 222
> data$B
[1] 100 100 100 200 200 200
> data$C
[1]  1  2 NA  1  2 NA
> data$D
[1] 15 16 17 18 19 20
> data$E
[1]  1 NA NA  1 NA NA

解决方案

We can group by 'A', 'B', and use summarise_at to paste all the non-NA elements

library(dplyr)
data %>% 
    group_by(A, B) %>%
    summarise_at(vars(-group_cols()), ~ toString(.[!is.na(.)]))
# A tibble: 2 x 5
# Groups:   A [2]
#      A     B C     D          E    
#  <dbl> <dbl> <chr> <chr>      <chr>
#1   111   100 1, 2  15, 16, 17 1    
#2   222   200 1, 2  18, 19, 20 1   

If we need to pass custom delimiter, use paste or str_c

library(stringr)
data %>% 
    group_by(A, B) %>%
    summarise_at(vars(-group_cols()), ~ str_c(.[!is.na(.)], collapse="_"))


Or using base R with aggregate

aggregate(. ~ A + B, data, FUN = function(x) 
      toString(x[!is.na(x)]), na.action = NULL)

这篇关于将多列折叠/连接/聚合为每个组内的单个逗号分隔字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

更多推荐

[db:关键词]

本文发布于:2023-04-23 07:51:20,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1039135.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:逗号   字符串   将多列

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!