Python中的成对设置相交

编程入门行业动态更新时间:2024-10-23 11:22:10

本文介绍了Python中的成对设置相交的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

如果我有可变数量的集合(我们叫数字 n )，每个集合最多具有 m 个元素，那么计算成对的最有效方法是什么对所有集合的交集?请注意，这不同于所有 n 集的交集.

If I have a variable number of sets (let's call the number n), which have at most m elements each, what's the most efficient way to calculate the pairwise intersections for all pairs of sets? Note that this is different from the intersection of all n sets.

例如，如果我有以下几组:

For example, if I have the following sets:

A={"a","b","c"} B={"c","d","e"} C={"a","c","e"}

我希望能够找到:

intersect_AB={"c"} intersect_BC={"c", "e"} intersect_AC={"a", "c"}

另一种可接受的格式(如果使事情变得更容易)将是给定集合中的项目到包含相同项目的集合的映射.例如:

Another acceptable format (if it makes things easier) would be a map of items in a given set to the sets that contain that same item. For example:

intersections_C={"a": {"A", "C"}, "c": {"A", "B", "C"} "e": {"B", "C"}}

我知道这样做的一种方法是创建一个字典，将所有 n 个集合的并集中的每个值映射到出现它的集合的列表，然后遍历所有这些值创建上面的intersections_C之类的列表，但是我不确定随着 n 的增加和集合的大小变得太大而如何缩放.

I know that one way to do so would be to create a dictionary mapping each value in the union of all n sets to a list of the sets in which it occurs and then iterate through all of those values to create lists such as intersections_C above, but I'm not sure how that scales as n increases and the sizes of the set become too large.

一些其他背景信息:

每个集合的长度大致相同，但也非常大(足够大以至于将它们全部存储在内存中是现实的考虑，尽管没有必要使用避免这种情况的算法)

与集合本身的大小相比，任何两个集合之间的交点的大小都非常小

如果有帮助，我们可以假设需要进行有关输入集排序的任何事情.

推荐答案

这应该做您想要的

import random as RND import string import itertools as IT

模拟一些数据

fnx = lambda: set(RND.sample(string.ascii_uppercase, 7)) S = [fnx() for c in range(5)]

生成S中集合的索引列表，以便可以在下面更简洁地引用这些集合

idx = range(len(S))

获取S中所有可能的项目对.但是，由于集合交集是可交换的，我们希望使用组合而不是置换

get all possible unique pairs of the items in S; however, since set intersection is commutative, we want the combinations rather than permutations

pairs = ITbinations(idx, 2)

编写函数以执行设置交点

nt = lambda a, b: S[a].intersection(S[b])

将此功能折叠到对上&将每个函数调用的结果键入其参数

res = dict([ (t, nt(*t)) for t in pairs ])

下面的结果是按照OP中引用的第一个选项格式化的，是一个字典，其中 values 是两个序列的设定交集；每个值键到一个由这些序列的两个索引组成的元组

the result below, formatted per the first option recited in the OP, is a dictionary in which the values are the set intersections of two sequences; each values keyed to a tuple comprised of the two indices of those sequences

这个解决方案实际上只是两行代码:(i)计算排列； (ii)然后对每个排列应用一些函数，将返回值存储在结构化容器(键值)容器中

this solution, is really just two lines of code: (i) calculate the permutations; (ii) then apply some function over each permutation, storing the returned value in a structured container (key-value) container

此解决方案的内存占用量很小，但是您可以通过在最后一步返回生成器表达式来实现更好的效果，即

the memory footprint of this solution is minimal, but you can do even better by returning a generator expression in the last step, ie

res = ( (t, nt(*t)) for t in pairs )

请注意，使用这种方法，对的序列和相应的交集都没有写到内存中，即，对和 res 都是迭代器.

notice that with this approach, neither the sequence of pairs nor the corresponding intersections have been written out in memory--ie, both pairs and res are iterators.

更多推荐

Python中的成对设置相交

本文发布于:2023-11-29 08:38:22，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1645893.html