按元组元素过滤元组列表(Filtering Lists of Tuples by Elements of Tuples)

编程入门行业动态更新时间:2024-10-25 16:18:02

我正在使用Python（2.7.9），并尝试通过这些元组的元素列表来过滤元组列表。特别是，我的对象具有以下形式：

tuples = [('a', ['a1', 'a2']), ('b',['b1', 'b2']), ('c',['c1', 'c2'])] filter = ['a', 'c']

我是Python的新手，过滤我可以发现的元组的最简单方法是使用以下列表理解：

tuples_filtered = [(x,y) for (x,y) in tuples if x in filter]

生成的筛选列表如下所示：

tuples_filtered = [('a', ['a1', 'a2']), ('c',['c1', 'c2'])]

不幸的是，这种列表理解似乎非常低效。我怀疑这是因为我的元组列表比我的过滤器（字符串列表）大得多。特别是，筛选器列表包含30,000个单词，元组列表包含大约134,000个2元组。

2元组的第一个元素在很大程度上是不同的，但是有一些重复的第一个元素的实例（实际上不确定有多少，但与列表的基数相比，它并不多）。

我的问题： 是否有更有效的方法来过滤元组列表中的元组列表？

（如果这是偏离主题或欺骗，请道歉。）

最满意答案

在你写的评论中：

筛选器列表包含30,000个单词，元组列表包含大约134,000个2元组。

in对列表的包含测试中，需要O（N）线性时间，当你这样做134k次时这很慢。每次必须迭代所有这些元素以找到匹配项。鉴于您正在过滤，并非所有这些第一个元素都将出现在30k列表中，因此您执行最多30k * 134k == 40亿次比较。

改为使用集合：

filter_set = set(filter)

设置包含测试是O（1）常数时间; 现在你把你的问题减少到134k测试。

你可以避免支出的一小部分时间是元组分配; 使用索引来提取您正在测试的一个元素：

tuples_filtered = [tup for tup in tuples if tup[0] in filter_set]

In a comment you write:

The filter list contains 30,000 words and the list of tuples contains about 134,000 2-tuples.

in containment tests against a list takes O(N) linear time, which is slow when you do this 134k times. Each time you have to iterate over all those elements to find a match. Given that you are filtering, not all those first elements are going to be present in the 30k list, so you are executing up to 30k * 134k == 4 billion comparisons.

Use a set instead:

filter_set = set(filter)

Set containment tests are O(1) constant time; now you reduced your problem to 134k tests.

A much smaller component of time you can avoid spending is the tuple assignment; use indexing to extract just the one element you are testing with:

tuples_filtered = [tup for tup in tuples if tup[0] in filter_set]

更多推荐

本文发布于:2023-08-07 13:17:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1464279.html