找到子列表之间的交集

编程入门 行业动态 更新时间:2024-10-15 16:18:28
本文介绍了找到子列表之间的交集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

最近,我遇到了一个问题 查找子列表之间的交集。告诉具有任何(1个或更多个)交集在一起的子列表成为一个。例如以下列表:

Recently i encounter with a question about Find the intersection between sublists . that tells the sublist which have any (1 or more ) intersection together become one. for example the following list :

l=[[1,2,3],[0,13,6],[9,10],[3,4,5],[10,11],[6,7,50]]

必须转换为:

[[1, 2, 3, 4, 5],[0, 50, 6, 7, 13],[9, 10, 11]]

所以我写以下功能做得很好,性能良好,我使用设置为了检查成员资格的快速复杂性,并且在内循环中,我使用比较第一个索引的切片的主列表和每个循环中的其他元素,并且还注意到,每个循环之后,列表将在循环内递归。 :

So i wrote the following function to do it that works well with a good performance, i use set for its fast complexity for checking membership and also in inner loop i use slicing that compare the first index of main list with other elements in every loop and also note that the list will been decrease after each loop ,as its a recursion inside the loop . :

s=[set(i) for i in g if i] def find_intersection(m_list): for i,v in enumerate(m_list) : for j,k in enumerate(m_list[i+1:],i+1): if v & k: s[i]=v.union(m_list.pop(j)) return find_intersection(m_list) return m_list s=[set(i) for i in l if i] print find_intersection(s) [set([1, 2, 3, 4, 5]), set([0, 50, 6, 7, 13]), set([9, 10, 11])]

但我认为可以用另一个解决方案可能有更好的性能,我想到了 collections.deque 或者可能与 numpy 或只是修改我的功能,使它更好吗。如果您有任何建议,我将不胜感激!

But i think it could be done with another solution maybe with better performance , i thought about collections.deque or maybe with numpy or just modifying my function and make it better ? . if you have any suggestion i would be grateful to hear about !

推荐答案

这是一种更有效的算法:

Here is a more efficient algorithm:

  • 对于至少一个子列表中存在的每个唯一编号,让我们维护包含此数字的所有子列表的索引列表。如果我们使用排序找到唯一的数字,或者 O(n),则这部分是 O(n * log n)如果我们使用哈希表,其中 n 是所有子列表中元素的总数。

  • For each unique number that is present in at least one of the sublists, let's maintain a list of indices of all sublists that contain this number. This part is O(n * log n) time if we use sorting to find unique numbers or O(n) if we use a hash table, where n is the total number of elements in all sublists.

    我们创建一个图,其中顶点是子列表索引,如果两个索引在所有数字中的至少一个索引列表中一起出现,则存在边。我们最多需要创建 O(n)边缘(这部分是微不足道的:没有必要显式创建所有边,我们可以从一个元素到每个子列表中的下一个元素,由于传递性而导致所有唯一元素)。以下是一些伪代码:

    Let's create a graph where vertices are sublist indices and an edge is present if two indices appear together in at least one list of indices among all numbers. We need create at most O(n) edges(this part is slightly non-trivial: there is no need to create all edges explicitly, we can just add an edge from an element to the next one in each sublist for all unique elements due to transitivity). Here is some pseudo code:

    g = empty graph for elem in unique_elements: sublist_indices = list of indices of all sublists that contain this element for i = 1 ... size(sublist_indices - 1): g.add_edge(sublist_indices[i], sublist_indices[i + 1])

  • 现在我们可以使用线性时间的深度优先搜索在此图中找到连接的组件(此图是

  • Now we can find connected components in this graph using depth-first search in linear time(this graph is undirected).

    我们知道哪些子列表应该被合并(当且仅当它们在相同的组件中时才应该被合并),所以我们可以很容易地构建答案。

    We know which sublists should be merged(they should be merged if and only if they are in the same connected component), so we can easily construct the answer.

    总时间复杂度为 O(n)。这是最理想的,因为阅读输入已经需要 O(n)操作。

    The total time complexity is O(n). It is optimal because reading the input already requires O(n) operations.

  • 更多推荐

    找到子列表之间的交集

    本文发布于:2023-11-29 12:28:46,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1646395.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:列表

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!