按元组元素过滤元组列表(Filtering Lists of Tuples by Elements of Tuples)

编程入门 行业动态 更新时间:2024-10-25 16:18:02
按元组元素过滤元组列表(Filtering Lists of Tuples by Elements of Tuples)

我正在使用Python(2.7.9),并尝试通过这些元组的元素列表来过滤元组列表。 特别是,我的对象具有以下形式:

tuples = [('a', ['a1', 'a2']), ('b',['b1', 'b2']), ('c',['c1', 'c2'])] filter = ['a', 'c']

我是Python的新手,过滤我可以发现的元组的最简单方法是使用以下列表理解:

tuples_filtered = [(x,y) for (x,y) in tuples if x in filter]

生成的筛选列表如下所示:

tuples_filtered = [('a', ['a1', 'a2']), ('c',['c1', 'c2'])]

不幸的是,这种列表理解似乎非常低效。 我怀疑这是因为我的元组列表比我的过滤器(字符串列表)大得多。 特别是,筛选器列表包含30,000个单词,元组列表包含大约134,000个2元组。

2元组的第一个元素在很大程度上是不同的,但是有一些重复的第一个元素的实例(实际上不确定有多少,但与列表的基数相比,它并不多)。

我的问题: 是否有更有效的方法来过滤元组列表中的元组列表?

(如果这是偏离主题或欺骗,请道歉。)

相关问题(未提及效率):

过滤元组列表的列表

I am working in Python (2.7.9) and am trying to filter a list of tuples by a list of elements of those tuples. In particular, my objects have the following form:

tuples = [('a', ['a1', 'a2']), ('b',['b1', 'b2']), ('c',['c1', 'c2'])] filter = ['a', 'c']

I am new to Python and the easiest way to filter the tuples that I could discover was with the following list comprehension:

tuples_filtered = [(x,y) for (x,y) in tuples if x in filter]

The resulting filtered list looks like:

tuples_filtered = [('a', ['a1', 'a2']), ('c',['c1', 'c2'])]

Unfortunately, this list comprehension seems to be very inefficient. I suspect this is because my list of tuples is much larger than my filter, the list of strings. In particular, the filter list contains 30,000 words and the list of tuples contains about 134,000 2-tuples.

The first elements of the 2-tuples are largely distinct, but there are a few instances of duplicate first elements (not sure how many, actually, but by comparison to the cardinality of the list it's not many).

My question: Is there a more efficient way to filter a list of tuples by a list of elements of those tuples?

(Apologies if this is off-topic or a dupe.)

Related question (which does not mention efficiency):

Filter a list of lists of tuples

最满意答案

在你写的评论中:

筛选器列表包含30,000个单词,元组列表包含大约134,000个2元组。

in对列表的包含测试中,需要O(N)线性时间,当你这样做134k次时这很 。 每次必须迭代所有这些元素以找到匹配项。 鉴于您正在过滤,并非所有这些第一个元素都将出现在30k列表中,因此您执行最多30k * 134k == 40亿次比较。

改为使用集合

filter_set = set(filter)

设置包含测试是O(1)常数时间; 现在你把你的问题减少到134k测试。

你可以避免支出的一小部分时间是元组分配; 使用索引来提取您正在测试的一个元素:

tuples_filtered = [tup for tup in tuples if tup[0] in filter_set]

In a comment you write:

The filter list contains 30,000 words and the list of tuples contains about 134,000 2-tuples.

in containment tests against a list takes O(N) linear time, which is slow when you do this 134k times. Each time you have to iterate over all those elements to find a match. Given that you are filtering, not all those first elements are going to be present in the 30k list, so you are executing up to 30k * 134k == 4 billion comparisons.

Use a set instead:

filter_set = set(filter)

Set containment tests are O(1) constant time; now you reduced your problem to 134k tests.

A much smaller component of time you can avoid spending is the tuple assignment; use indexing to extract just the one element you are testing with:

tuples_filtered = [tup for tup in tuples if tup[0] in filter_set]

更多推荐

本文发布于:2023-08-07 13:17:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1464279.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:元素   列表   按元组   Filtering   Elements

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!