有效地使用R中的集合(efficiently working with sets in R)

编程入门 行业动态 更新时间:2024-10-28 00:26:28
有效地使用R中的集合(efficiently working with sets in R)

背景:

我正在处理R中的组合问题。对于给定的集合列表,我需要生成每组的所有对而不产生重复。

例:

initial_list_of_sets <- list() initial_list_of_sets[[1]] <- c(1,2,3) initial_list_of_sets[[2]] <- c(2,3,4) initial_list_of_sets[[3]] <- c(3,2) initial_list_of_sets[[4]] <- c(5,6,7) get_pairs(initial_list_of_sets) # should return (1 2),(1 3),(2 3),(2 4),(3 4),(5 6),(5 7),(6 7)

请注意,结果中不包括(3 2),因为它在数学上等于(2 3)。

到目前为止我的(工作但效率低下)方法:

# checks if sets contain a_set contains <- function(sets, a_set){ for (existing in sets) { if (setequal(existing, a_set)) { return(TRUE) } } return(FALSE) } get_pairs <- function(from_sets){ all_pairs <- list() for (a_set in from_sets) { # generate all pairs for current set pairs <- combn(x = a_set, m = 2, simplify = FALSE) for (pair in pairs) { # only add new pairs if they are not yet included in all_pairs if (!contains(all_pairs, pair)) { all_pairs <- c(all_pairs, list(pair)) } } } return(all_pairs) }

我的问题:

当我处理数学集时,我不能使用%in%运算符而不是my contains函数,因为那时(2 3)和(3 2)将是不同的对。 但是,迭代contains所有现有集合似乎效率很低。 有没有更好的方法来实现这个功能?

Background:

I am dealing with a combinatorial problem in R. For a given list of sets I need to generate all pairs per set without producing duplicates.

Example:

initial_list_of_sets <- list() initial_list_of_sets[[1]] <- c(1,2,3) initial_list_of_sets[[2]] <- c(2,3,4) initial_list_of_sets[[3]] <- c(3,2) initial_list_of_sets[[4]] <- c(5,6,7) get_pairs(initial_list_of_sets) # should return (1 2),(1 3),(2 3),(2 4),(3 4),(5 6),(5 7),(6 7)

Please note that (3 2) is not included in the results, as it is mathematically equal to (2 3).

My (working but inefficient) approach so far:

# checks if sets contain a_set contains <- function(sets, a_set){ for (existing in sets) { if (setequal(existing, a_set)) { return(TRUE) } } return(FALSE) } get_pairs <- function(from_sets){ all_pairs <- list() for (a_set in from_sets) { # generate all pairs for current set pairs <- combn(x = a_set, m = 2, simplify = FALSE) for (pair in pairs) { # only add new pairs if they are not yet included in all_pairs if (!contains(all_pairs, pair)) { all_pairs <- c(all_pairs, list(pair)) } } } return(all_pairs) }

My question:

As I am dealing with mathematical sets I can't use the %in% operator instead of my contains function, because then (2 3) and (3 2) would be different pairs. However it seems very inefficient to iterate over all existing sets in contains. Is there a better way to implement this function?

最满意答案

也许您可以将get_pairs函数重写为如下所示:

myFun <- function(inlist) { unique(do.call(rbind, lapply(inlist, function(x) t(combn(sort(x), 2))))) }

这是一个快速的时间比较。

n <- 100 set.seed(1) x <- sample(2:8, n, TRUE) initial_list_of_sets <- lapply(x, function(y) sample(100, y)) system.time(get_pairs(initial_list_of_sets)) # user system elapsed # 1.964 0.000 1.959 system.time(myFun(initial_list_of_sets)) # user system elapsed # 0.012 0.000 0.014

如果需要,您可以按行split矩阵以获取列表。

例如:

myFun <- function(inlist) { temp <- unique(do.call(rbind, lapply(inlist, function(x) t(combn(sort(x), 2))))) lapply(1:nrow(temp), function(x) temp[x, ]) }

Perhaps you can rewrite your get_pairs function as something like the following:

myFun <- function(inlist) { unique(do.call(rbind, lapply(inlist, function(x) t(combn(sort(x), 2))))) }

Here's a quick time comparison.

n <- 100 set.seed(1) x <- sample(2:8, n, TRUE) initial_list_of_sets <- lapply(x, function(y) sample(100, y)) system.time(get_pairs(initial_list_of_sets)) # user system elapsed # 1.964 0.000 1.959 system.time(myFun(initial_list_of_sets)) # user system elapsed # 0.012 0.000 0.014

If needed, you can split the matrix by rows to get your list.

Eg:

myFun <- function(inlist) { temp <- unique(do.call(rbind, lapply(inlist, function(x) t(combn(sort(x), 2))))) lapply(1:nrow(temp), function(x) temp[x, ]) }

更多推荐

本文发布于:2023-07-24 00:08:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1239053.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:有效地   efficiently   sets   working

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!