拆分字符串并返回唯一值[closed](Split a string and return the unique values [closed])

系统教程 行业动态 更新时间:2024-06-14 17:03:52
拆分字符串并返回唯一值[closed](Split a string and return the unique values [closed])

我有这样的字符串列表:

D<-c("0,0,0,0,0,0,0", "0,0,0,0,0,0,0,", "0,20,0,0,0,30,0", "0,60,61,70,0,0,","0,1,1,0,0,0,0,")

我想结束这个简化版本,每个字符串只有唯一的值。

D2<-c("0","0","0,20,30","0,60,61,70","0,1")

我试过用strsplit和独特的组合循环播放,但最终得到了一堆NA。

I have a list of strings like this:

D<-c("0,0,0,0,0,0,0", "0,0,0,0,0,0,0,", "0,20,0,0,0,30,0", "0,60,61,70,0,0,","0,1,1,0,0,0,0,")

I'd like to end up with a condensed version of this, with only the unique values for each string.

D2<-c("0","0","0,20,30","0,60,61,70","0,1")

I've tried looping through with a combination of strsplit and unique, but end up with a bunch of NA's.

最满意答案

这个问题已经吸引了三个答案,但即将被关闭。 那么,在他的评论中,由电邮提供的最好的解决办法是:

sapply(strsplit(D, ","), function(x) paste(unique(x), collapse = ",")) #[1] "0" "0" "0,20,30" "0,60,61,70" "0,1"

数据

正如OP所指出的那样:

D < -c("0,0,0,0,0,0,0", "0,0,0,0,0,0,0,", "0,20,0,0,0,30,0", "0,60,61,70,0,0,","0,1,1,0,0,0,0,")

基准

一个小基准

library(stringr) microbenchmark::microbenchmark( thelatemail = sapply(strsplit(D, ","), function(x) paste(unique(x), collapse = ",")), epi99 = D %>% sapply(str_split, ",") %>% sapply(unique) %>% sapply(paste, collapse=","), trungnt37 = { out <- c() for(i in 1:length(D)){ k <- strsplit(x = D[i], split = ",") m <- paste(unique(unlist(k)), collapse = ",") out <- c(out, m) } out } )

表明, 电邮的答案是最快的:

#Unit: microseconds # expr min lq mean median uq max neval # thelatemail 57.770 61.9240 72.63590 67.9655 75.705 151.789 100 # epi99 318.679 338.5020 383.76284 362.6670 410.054 781.972 100 # trungnt37 74.384 81.3695 96.77465 87.7885 102.702 240.897 100

请注意, epi99的stringr方法不会返回期望的结果,因为它具有尾随逗号。

This question has attracted already three answers but is about to be closed. The best solution IMHO provided by thelatemail in his comment would be missing then:

sapply(strsplit(D, ","), function(x) paste(unique(x), collapse = ",")) #[1] "0" "0" "0,20,30" "0,60,61,70" "0,1"

Data

As given by the OP:

D < -c("0,0,0,0,0,0,0", "0,0,0,0,0,0,0,", "0,20,0,0,0,30,0", "0,60,61,70,0,0,","0,1,1,0,0,0,0,")

Benchmark

A small benchmark

library(stringr) microbenchmark::microbenchmark( thelatemail = sapply(strsplit(D, ","), function(x) paste(unique(x), collapse = ",")), epi99 = D %>% sapply(str_split, ",") %>% sapply(unique) %>% sapply(paste, collapse=","), trungnt37 = { out <- c() for(i in 1:length(D)){ k <- strsplit(x = D[i], split = ",") m <- paste(unique(unlist(k)), collapse = ",") out <- c(out, m) } out } )

shows that thelatemail's answer is the fastest:

#Unit: microseconds # expr min lq mean median uq max neval # thelatemail 57.770 61.9240 72.63590 67.9655 75.705 151.789 100 # epi99 318.679 338.5020 383.76284 362.6670 410.054 781.972 100 # trungnt37 74.384 81.3695 96.77465 87.7885 102.702 240.897 100

Note that epi99's stringr approach doesn't return the expected result as it has trailing commas.

更多推荐

本文发布于:2023-04-24 12:14:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/dzcp/4089aaa7515ba89ed0405fba83311435.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:字符串   closed   唯一值   Split   values

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!