Python中的成对设置相交

编程入门 行业动态 更新时间:2024-10-23 11:22:10
本文介绍了Python中的成对设置相交的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

如果我有可变数量的集合(我们叫数字 n ),每个集合最多具有 m 个元素,那么计算成对的最有效方法是什么对所有集合的交集?请注意,这不同于所有 n 集的交集.

If I have a variable number of sets (let's call the number n), which have at most m elements each, what's the most efficient way to calculate the pairwise intersections for all pairs of sets? Note that this is different from the intersection of all n sets.

例如,如果我有以下几组:

For example, if I have the following sets:

A={"a","b","c"} B={"c","d","e"} C={"a","c","e"}

我希望能够找到:

intersect_AB={"c"} intersect_BC={"c", "e"} intersect_AC={"a", "c"}

另一种可接受的格式(如果使事情变得更容易)将是给定集合中的项目到包含相同项目的集合的映射.例如:

Another acceptable format (if it makes things easier) would be a map of items in a given set to the sets that contain that same item. For example:

intersections_C={"a": {"A", "C"}, "c": {"A", "B", "C"} "e": {"B", "C"}}

我知道这样做的一种方法是创建一个字典,将所有 n 个集合的并集中的每个值映射到出现它的集合的列表,然后遍历所有这些值创建上面的intersections_C之类的列表,但是我不确定随着 n 的增加和集合的大小变得太大而如何缩放.

I know that one way to do so would be to create a dictionary mapping each value in the union of all n sets to a list of the sets in which it occurs and then iterate through all of those values to create lists such as intersections_C above, but I'm not sure how that scales as n increases and the sizes of the set become too large.

一些其他背景信息:

  • 每个集合的长度大致相同,但也非常大(足够大以至于将它们全部存储在内存中是现实的考虑,尽管没有必要使用避免这种情况的算法)
  • 与集合本身的大小相比,任何两个集合之间的交点的大小都非常小
  • 如果有帮助,我们可以假设需要进行有关输入集排序的任何事情.
  • 推荐答案

    这应该做您想要的

    import random as RND import string import itertools as IT

    模拟一些数据

    fnx = lambda: set(RND.sample(string.ascii_uppercase, 7)) S = [fnx() for c in range(5)]

    生成S中集合的索引列表,以便可以在下面更简洁地引用这些集合

    idx = range(len(S))

    获取S中所有可能的项目对.但是,由于集合交集是可交换的,我们希望使用组合而不是置换

    get all possible unique pairs of the items in S; however, since set intersection is commutative, we want the combinations rather than permutations

    pairs = ITbinations(idx, 2)

    编写函数以执行设置交点

    nt = lambda a, b: S[a].intersection(S[b])

    将此功能折叠到对上&将每个函数调用的结果键入其参数

    res = dict([ (t, nt(*t)) for t in pairs ])

    下面的结果是按照OP中引用的第一个选项格式化的,是一个字典,其中 values 是两个序列的设定交集;每个值键到一个由这些序列的两个索引组成的元组

    the result below, formatted per the first option recited in the OP, is a dictionary in which the values are the set intersections of two sequences; each values keyed to a tuple comprised of the two indices of those sequences

    这个解决方案实际上只是两行代码:(i)计算排列; (ii)然后对每个排列应用一些函数,将返回值存储在结构化容器(键值)容器中

    this solution, is really just two lines of code: (i) calculate the permutations; (ii) then apply some function over each permutation, storing the returned value in a structured container (key-value) container

    此解决方案的内存占用量很小,但是您可以通过在最后一步返回生成器表达式来实现更好的效果,即

    the memory footprint of this solution is minimal, but you can do even better by returning a generator expression in the last step, ie

    res = ( (t, nt(*t)) for t in pairs )

    请注意,使用这种方法,对的序列和相应的交集都没有写到内存中,即,对和 res 都是迭代器.

    notice that with this approach, neither the sequence of pairs nor the corresponding intersections have been written out in memory--ie, both pairs and res are iterators.

    更多推荐

    Python中的成对设置相交

    本文发布于:2023-11-29 08:38:22,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1645893.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:Python

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!