问题描述
限时送ChatGPT账号..这是发布将一列折叠/连接/聚合为每个组内的单个逗号分隔字符串
目标:根据一个分组变量聚合多列,并通过选择的分隔符分隔各个值.
可重现的例子:
data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = c(rep(c(100), 3), rep(200),3)), C = rep(c(1,2,NA),2), D = c(15:20), E = rep(c(1,NA,NA),2))数据A B C D E1 111 100 1 15 12 111 100 2 16 不适用3 111 100 不适用 17 不适用4 222 200 1 18 15 222 200 2 19 不适用6 222 200 不适用 20 不适用
A 是分组变量,但 B 仍显示在整体结果中(B 取决于我的应用程序中的 A),而 C、D 和 E 是要折叠为分隔的 character
字符串的变量.
期望输出
A B C D E1 111 100 1,2 15,16,17 12 222 100 1,2 18,19,20 1
我对 R 没有太多经验.我确实尝试扩展 G. Grothendieck 发布到链接帖子的解决方案以满足我的要求,但不能完全适合多个列.>
获得所需输出的正确实现是什么?
在我的尝试中,我特别关注 group_by
和 summarise_all
和 aggregate
.它们一团糟,所以我不相信展示它甚至会有所帮助.
发布的解决方案非常适合显示所需的结果!继续为那些发现它的人提高这篇文章的价值.
用户如何选择自己的分隔符.例如'-'
, '\n'
@akrun 和@tmfmnk 的当前解决方案都导致列表而不是连接的 character
字符串.如果我说错了,请纠正我.
data$D[1] 15 16 17 18 19 20>数据$A[1] 111 111 111 222 222 222>数据$B[1] 100 100 100 200 200 200>数据$C[1] 1 2 不适用 1 2 不适用>数据$D[1] 15 16 17 18 19 20>数据$E[1] 1 NA NA 1 NA NA
解决方案我们可以按'A'、'B'进行分组,并使用summarise_at
来粘贴
所有非NA元素
库(dplyr)数据%>%group_by(A, B) %>%summarise_at(vars(-group_cols()), ~ toString(.[!is.na(.)]))# 小块:2 x 5# 组:A [2]# A B C D E# <dbl><dbl><chr><chr><chr>#1 111 100 1, 2 15, 16, 17 1#2 222 200 1, 2 18, 19, 20 1
如果我们需要传递自定义分隔符,请使用 paste
或 str_c
库(stringr)数据%>%group_by(A, B) %>%summarise_at(vars(-group_cols()), ~ str_c(.[!is.na(.)], collapse="_"))
<小时>
或者使用 base R
和 aggregate
aggregate(. ~ A + B, data, FUN = function(x)toString(x[!is.na(x)]), na.action = NULL)
This is an extension to post Collapse / concatenate / aggregate a column to a single comma separated string within each group
Goal: aggregate multiple columns according to one grouping variable and separate individual values by separator of choice.
Reproducible example:
data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = c(rep(c(100), 3), rep(200,3)), C = rep(c(1,2,NA),2), D = c(15:20), E = rep(c(1,NA,NA),2))
data
A B C D E
1 111 100 1 15 1
2 111 100 2 16 NA
3 111 100 NA 17 NA
4 222 200 1 18 1
5 222 200 2 19 NA
6 222 200 NA 20 NA
A is the grouping variable but B is still displayed in overall result (B depends on A in my application) and C, D and E are the variables to be collapsed into separated character
strings.
Desired Output
A B C D E
1 111 100 1,2 15,16,17 1
2 222 100 1,2 18,19,20 1
I don't have a ton of experience with R. I did try to expand upon the solutions posted by G. Grothendieck to the linked post to meet my requirements but can't quite get it right for multiple columns.
What would be a proper implementation to get the desired output?
I focused specifically on group_by
and summarise_all
and aggregate
in my attempts. They are a complete mess so I don't believe it would even be helpful to display.
EDIT: Solutions posted work great at displaying desired result! To continue improving the value in this post for those that find it.
How would it be possible for users to select their own separation characters.
e.g. '-'
, '\n'
The current solutions by @akrun and @tmfmnk both result in lists instead of a concatenated character
string. Please correct me if I said this incorrectly.
data$D
[1] 15 16 17 18 19 20
> data$A
[1] 111 111 111 222 222 222
> data$B
[1] 100 100 100 200 200 200
> data$C
[1] 1 2 NA 1 2 NA
> data$D
[1] 15 16 17 18 19 20
> data$E
[1] 1 NA NA 1 NA NA
解决方案
We can group by 'A', 'B', and use summarise_at
to paste
all the non-NA elements
library(dplyr)
data %>%
group_by(A, B) %>%
summarise_at(vars(-group_cols()), ~ toString(.[!is.na(.)]))
# A tibble: 2 x 5
# Groups: A [2]
# A B C D E
# <dbl> <dbl> <chr> <chr> <chr>
#1 111 100 1, 2 15, 16, 17 1
#2 222 200 1, 2 18, 19, 20 1
If we need to pass custom delimiter, use paste
or str_c
library(stringr)
data %>%
group_by(A, B) %>%
summarise_at(vars(-group_cols()), ~ str_c(.[!is.na(.)], collapse="_"))
Or using base R
with aggregate
aggregate(. ~ A + B, data, FUN = function(x)
toString(x[!is.na(x)]), na.action = NULL)
这篇关于将多列折叠/连接/聚合为每个组内的单个逗号分隔字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论