找到列表中的子集列表的索引(Find the index of a list which is subset in a list of list)

编程入门 行业动态 更新时间:2024-10-27 00:33:47
找到列表中的子集列表的索引(Find the index of a list which is subset in a list of list)

我有两个非常大的名单(500万的顺序)。

例如:

1)第一个列表a总是包含8个元素的列表。

2)第二个列表b总是包含4个元素的列表。

对于b中的每个列表,可能存在多个子集,但这不是问题。

a=[[0 1 10 9 369 370 379 378],[1 2 11 10 370 371 380 379]..[[0 1 10 9 365 370 379 400]] b=[[0 1 370 369],[1 2 371 370], ......]

我想知道b中每个列表中包含所有元素的列表索引。

例如:我知道“b [0] = [0 1 370 369]”是“a [0] = [0 1 10 9 369 370 379 378]”的子集,因为b [0]中的所有元素都是包含在[0]中。 b [1]作为[1]的子集也是如此。

所以我想要一个这样的输出:c = [[0],[1] .......]。

如果有多个子集,我应该得到类似的东西:c = [[0],[1] .... [20,19] .....]

我的问题是我的代码太慢了:

index=[] for i in range(len(b)): for j in range(len(a)): if set(b[i])<set(a[j]): print b[i] print a[j] print j index.append([j]) #index in a

这是我的代码的输出:

[ 0 1 370 369] [ 0 1 10 9 369 370 379 378] 0 [ 1 2 371 370] [ 1 2 11 10 370 371 380 379] 1 . . [369 370 739 738] [369 370 379 378 738 739 748 747] 320 . .

在循环结束时,len(index)= len(b),因为我确信b中的每个列表总是a的一个子集。

每次迭代最多需要30/40秒。

我确信有更多的pythonic方法来执行相同的循环,我该如何加速它?

谢谢

I have two very large list of list (order of 5 millions).

For instance:

1) The first list, a, contains always list of 8 elements.

2) The second list, b, contains always list of 4 elements.

For each list in b there may be more than one subsets but this is not a problem.

a=[[0 1 10 9 369 370 379 378],[1 2 11 10 370 371 380 379]..[[0 1 10 9 365 370 379 400]] b=[[0 1 370 369],[1 2 371 370], ......]

I'd like to know for each list in b the index of the list in a that contains all the its element.

For instance: I know that "b[0]=[ 0 1 370 369]" is a subset of "a[0]=[0 1 10 9 369 370 379 378]" because all the elements in b[0] are contained in a[0]. The same things for b[1] being a subset of a[1].

So I'd like to have an output of this kind: c=[[0],[1].......].

If there is more than one subset I should get something like: c=[[0],[1]....[20,19].....]

My problem is that my code is too too slow:

index=[] for i in range(len(b)): for j in range(len(a)): if set(b[i])<set(a[j]): print b[i] print a[j] print j index.append([j]) #index in a

Here is the output of my code:

[ 0 1 370 369] [ 0 1 10 9 369 370 379 378] 0 [ 1 2 371 370] [ 1 2 11 10 370 371 380 379] 1 . . [369 370 739 738] [369 370 379 378 738 739 748 747] 320 . .

At the end of the loop len(index)=len(b) because I know for sure that each list in b is always a subset of a.

It takes up to 30/40 seconds for each iteration.

I am sure there is a more pythonic way to perform the same loop, how can I speed it up?

Thank you

最满意答案

构建一个dict,显示包含每个数字的列表:

import collections number_locations = collections.defaultdict(set) for i, l in enumerate(a): for num in l: number_locations[num].add(i)

然后对于b每个列表,查找其元素中可以找到的位置,并使用集合交集来查找包含所有4个数字的元素:

index = [set.intersection(*[number_locations[num] for num in l]) for l in b]

这会产生一组集合; 如果你真的需要列表,你可以在项目上调用list ,或者sorted以获得索引的排序列表。

Build a dict showing which lists in a contain each number:

import collections number_locations = collections.defaultdict(set) for i, l in enumerate(a): for num in l: number_locations[num].add(i)

Then for each list in b, look up where in a its elements can be found and take the set intersection to find which elements of a contain all 4 numbers:

index = [set.intersection(*[number_locations[num] for num in l]) for l in b]

This produces a list of sets; if you really need lists, you can call list on the items, or sorted to get sorted lists of indices.

更多推荐

本文发布于:2023-07-04 16:50:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1026849.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:子集   索引   列表   列表中   list

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!