问题描述
限时送ChatGPT账号..我想转换一个包含联系信息的数据框,其中包含类似信息的城市列表,例如电话号码出现在多列中.
I want to transforms a data frame with contact information with of a for a list of municipalities in which similar information such as e.g. phone number appears in multiple columns.
我尝试使用 reshape2::dcast()
和 tidyr::spread()
,这两种方法都不能解决我的问题.我还检查了堆栈溢出的其他帖子,例如
I have tried using both reshape2::dcast()
as well as tidyr::spread()
, neither of which solves my problem. I have also checked other post of stack overflow e.g.
多列展开
尚未找到有效的解决方案.在我看来,这些问题应该相当简单(并且可以通过 spread 或 dcast 解决).
Have yet to find a solution which works. It seems to me that the problems should be fairly straightforward (and solvable with spread or dcast).
tmp <- tibble(municipality = c("M1", "M2"),
name1 = c("n1", "n2"), name2 = c("n3", "n4"), name3 = c(NA, "n5"), # placeholder names
phone1 = c("p1", "p2"), phone2 = c("p3", "p4"), phone3 = c(NA, "p5")) # placeholder phone numbers
#solution 1
tmp %>% gather("colname", "value", -municipality) %>%
filter(municipality == "M1") %>% #too simplify, should be replaced with group_by(municipality)
na.omit() %>% mutate(colname = str_replace(colname, "\\d", replacement = "")) %>%
spread(., key = "colname", value = "value")
#Solution 2
tmp %>% gather("colname", "value", -municipality) %>%
filter(municipality == "M1") %>% # same as above
na.omit() %>% mutate(colname = str_replace(colname, "\\d", replacement = "")) %>%
dcast(municipality + value ~colname)
解决方案 1 导致以下错误:错误:每行输出必须由唯一的键组合标识.
Solution 1 results in the following error: Error: Each row of output must be identified by a unique combination of keys.
解决方案 2 导致以下数据框(这是所需的结果,但需要折叠):
Solution 2 results in the following data frame (which is the desired result except it needs to be collapsed):
municipality value name phone
1 M1 n1 n1 <NA>
2 M1 n3 n3 <NA>
3 M1 p1 <NA> p1
4 M1 p3 <NA> p3
推荐答案
您在寻找吗?
library(dplyr)
library(tidyr)
tmp %>%
gather(key, value, -municipality, na.rm = TRUE) %>%
mutate(key = gsub("\\d+", "", key)) %>%
group_by(municipality, key) %>%
mutate(row = row_number()) %>%
spread(key, value) %>%
select(-row)
# municipality name phone
# <chr> <chr> <chr>
#1 M1 n1 p1
#2 M1 n3 p3
#3 M2 n2 p2
#4 M2 n4 p4
#5 M2 n5 p5
我们可以使用 gather
将数据以长格式删除 NA
值.从各个列名称中删除数字,以便它们共享相同的 key
,创建一个列 group_by
municipality
和 key
以spread
将数据转换为宽格式.
We can use gather
to bring the data in long format dropping NA
values. Remove numbers from individual column names so that they share the same key
, create a column group_by
municipality
and key
to spread
the data into wide format.
这篇关于铸造(传播)多列字符向量的优雅解决方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论