删除 R 中所有重复项的最快方法

编程入门 行业动态 更新时间:2024-10-25 08:19:09
本文介绍了删除 R 中所有重复项的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我想删除在向量中多次出现的所有项目.具体来说,这包括字符、数字和整数向量.目前,我正在使用 duplicated() 向前和向后(使用 fromLast 参数).

I'd like to remove all items that appear more than once in a vector. Specifically, this includes character, numeric and integer vectors. Currently, I'm using duplicated() both forwards and backward (using the fromLast parameter).

在 R 中是否有一种计算效率更高(更快)的方法来执行此操作?下面的解决方案很简单,可以写/读,但是执行两次重复搜索似乎效率低下.也许使用额外数据结构的基于计数的方法会更好?

Is there a more computationally efficient (faster) way to execute this in R? The solution below is simple enough to write/read, but it seems inefficient to execute the duplicate search twice. Perhaps a counting-based method using an additional data structure would be better?

例子:

d <- c(1,2,3,4,1,5,6,4,2,1) d[!(duplicated(d) | duplicated(d, fromLast=TRUE))] #[1] 3 5 6

相关的 SO 帖子 这里 和 这里.

推荐答案

一些时间安排:

set.seed(1001) d <- sample(1:100000, 100000, replace=T) d <- c(d, sample(d, 20000, replace=T)) # ensure many duplicates mb <- microbenchmark::microbenchmark( d[!(duplicated(d) | duplicated(d, fromLast=TRUE))], setdiff(d, d[duplicated(d)]), {tmp <- rle(sort(d)); tmp$values[tmp$lengths == 1]}, as.integer(names(table(d)[table(d)==1])), d[!(duplicated.default(d) | duplicated.default(d, fromLast=TRUE))], d[!(d %in% d[duplicated(d)])], { ud = unique(d); ud[tabulate(match(d, ud)) == 1L] }, d[!(.Internal(duplicated(d, F, F, NA)) | .Internal(duplicated(d, F, T, NA)))] ) summary(mb)[, c(1, 4)] # in milliseconds # expr mean #1 d[!(duplicated(d) | duplicated(d, fromLast = TRUE))] 18.34692 #2 setdiff(d, d[duplicated(d)]) 24.84984 #3 { tmp <- rle(sort(d)) tmp$values[tmp$lengths == 1] } 9.53831 #4 as.integer(names(table(d)[table(d) == 1])) 255.76300 #5 d[!(duplicated.default(d) | duplicated.default(d, fromLast = TRUE))] 18.35360 #6 d[!(d %in% d[duplicated(d)])] 24.01009 #7 { ud = unique(d) ud[tabulate(match(d, ud)) == 1L] } 32.10166 #8 d[!(.Internal(duplicated(d, F, F, NA)) | .Internal(duplicated(d, F, T, NA)))] 18.33475

鉴于评论,让我们看看它们是否都正确?

Given the comments let's see if they are all correct?

results <- list(d[!(duplicated(d) | duplicated(d, fromLast=TRUE))], setdiff(d, d[duplicated(d)]), {tmp <- rle(sort(d)); tmp$values[tmp$lengths == 1]}, as.integer(names(table(d)[table(d)==1])), d[!(duplicated.default(d) | duplicated.default(d, fromLast=TRUE))], d[!(d %in% d[duplicated(d)])], { ud = unique(d); ud[tabulate(match(d, ud)) == 1L] }, d[!(.Internal(duplicated(d, F, F, NA)) | .Internal(duplicated(d, F, T, NA)))]) all(sapply(ls, all.equal, c(3, 5, 6))) # TRUE

更多推荐

删除 R 中所有重复项的最快方法

本文发布于:2023-11-29 23:32:38,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1647964.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:最快   方法

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!