加速查找两个词典之间的匹配(Python)(Speed up finding matches between two dictionaries (Python))

编程入门 行业动态 更新时间:2024-10-26 10:36:17
加速查找两个词典之间的匹配(Python)(Speed up finding matches between two dictionaries (Python))

我正在使用Python 2.7研究空间分析问题。 我有一个字典edges表示图中的边,其中key是edgeID,值是起点/终点:

{e1: [(12.8254, 55.3880), (12.8343, 55.3920)], e2: [(12.8254, 55.3880), (12.8235, 55.3857)], e3: [(12.2432, 57.1120), (12.2426, 57.1122)]}

我有另一个字典nodes ,其中key是nodeID,值是节点坐标:

{n14: (12.8254, 55.3880), n15: (12.8340, 55.3883), n16: (12.8235, 55.3857), n17: (12.8343, 55.3920)}

我需要得到一个看起来像的列表(键中的'n'和'e'仅用于此问题的插图目的,我有整数):

[(e1,n14,n17),(e2,n14,n16)..]

也就是说,我遍历边缘dict,取每个键,找到nodes dict中存在的值并添加到元组。 这就是我现在的做法:

edgesList = [] for featureId in edges: edgeFeatureId = [k for k, v in edges.iteritems() if k == featureId][0] edgeStartPoint = [k for k, v in nodes.iteritems() if v == edges[featureId][0]][0]#start point edgeEndPoint = [k for k, v in nodes.iteritems() if v == edges[featureId][1]][0]#end point edgesList.append((edgeFeatureId,edgeStartPoint,edgeEndPoint))

这是有效的,但是在处理大型数据集时非常慢(100K边缘和90K节点需要大约10分钟)。

我已经弄清楚如何在获取每个元组的项目时使用列表理解,但是是否可以将我的3个列表推导集成为一个以避免使用for循环迭代边缘(如果这会加快速度)?

有没有其他方法可以更快地建立这样的列表?

UPDATE

正如马丁建议的那样,我已将我的节点颠倒了:

nodesDict = dict((v,k) for k,v in oldnodesDict.iteritems())

将节点坐标元组作为键,将nodeID作为值。 不幸的是,它没有加快查找过程(这里是更新的代码 - 我为edgeStartPoint和edgeEndPoint翻转了k和v ):

edgesList = [] for featureId in edges: edgeFeatureId = [k for k, v in edges.iteritems() if k == featureId][0] edgeStartPoint = [v for k, v in nodes.iteritems() if k == edges[featureId][0]][0]#start point edgeEndPoint = [v for k, v in nodes.iteritems() if k == edges[featureId][1]][0]#end point edgesList.append((edgeFeatureId,edgeStartPoint,edgeEndPoint))

I am working on a spatial analysis problem using Python 2.7. I have a dictionary edges representing edges in a graph, where key is the edgeID and the value is the start/end points:

{e1: [(12.8254, 55.3880), (12.8343, 55.3920)], e2: [(12.8254, 55.3880), (12.8235, 55.3857)], e3: [(12.2432, 57.1120), (12.2426, 57.1122)]}

And I have another dictionary nodes where key is the nodeID and the value is the node coordinates:

{n14: (12.8254, 55.3880), n15: (12.8340, 55.3883), n16: (12.8235, 55.3857), n17: (12.8343, 55.3920)}

I need to get a list which will look like (the 'n' and 'e' in the keys is just for illustration purposes for this question, I have integers there):

[(e1,n14,n17),(e2,n14,n16)..]

That is, I iterate over the edges dict, take every key, find the value that exist in the nodes dict and add to a tuple. This is how I do it now:

edgesList = [] for featureId in edges: edgeFeatureId = [k for k, v in edges.iteritems() if k == featureId][0] edgeStartPoint = [k for k, v in nodes.iteritems() if v == edges[featureId][0]][0]#start point edgeEndPoint = [k for k, v in nodes.iteritems() if v == edges[featureId][1]][0]#end point edgesList.append((edgeFeatureId,edgeStartPoint,edgeEndPoint))

This is working, but is very slow when working with large datasets (with 100K edges and 90K nodes it takes ca 10 mins).

I've figured out how to use a list comprehension while getting each of the tuple's items, but is it possible to get my 3 list comprehensions into one to avoid iterating the edges with the for loop (if this will speed things up)?

Is there another way I can build such a list faster?

UPDATE

As Martin suggested, I've inverted my nodes dict:

nodesDict = dict((v,k) for k,v in oldnodesDict.iteritems())

having node coordinates tuple as key and the nodeID as value. Unfortunately, it did not speed up the lookup process (here is the updated code - I flipped the k and v for edgeStartPoint and edgeEndPoint):

edgesList = [] for featureId in edges: edgeFeatureId = [k for k, v in edges.iteritems() if k == featureId][0] edgeStartPoint = [v for k, v in nodes.iteritems() if k == edges[featureId][0]][0]#start point edgeEndPoint = [v for k, v in nodes.iteritems() if k == edges[featureId][1]][0]#end point edgesList.append((edgeFeatureId,edgeStartPoint,edgeEndPoint))

最满意答案

由于您是基于坐标进行匹配,因此应该反转节点字典。

也就是说,它应该是这样的:

{(12.8254, 55.3880): n14, (12.8340, 55.3883): n15, (12.8235, 55.3857): n16, (12.8343, 55.3920): n17}

这样,当您在边缘上进行迭代时,您可以快速查找相应的节点:

edgesList = [] for featureId in edges: coordinates = edges[featureId] c0, c1 = coordinates n0 = nodes[c0] n1 = nodes[c1] edgesList.append((featureId, n0, n1))

请记住,字典可以非常快速地找到任何给定键的相应值。 如此快,在一般情况下,鉴于字典大小为1或大小为1百万,查找的速度几乎不会改变 。

Since you are matching based on the coordinates, your nodes dictionary should be inverted.

That is, it should look like so:

{(12.8254, 55.3880): n14, (12.8340, 55.3883): n15, (12.8235, 55.3857): n16, (12.8343, 55.3920): n17}

This way, when you are iterating over your edges, you have a very quick look-up to find the corresponding nodes:

edgesList = [] for featureId in edges: coordinates = edges[featureId] c0, c1 = coordinates n0 = nodes[c0] n1 = nodes[c1] edgesList.append((featureId, n0, n1))

Remember, that dictionaries are extremely quick at finding the corresponding value for any given key. So quick that in the average case, the speed of the lookup should barely change given that the dictionary is of size 1 or size 1 million.

更多推荐

本文发布于:2023-08-06 16:52:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1452889.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:词典   两个   Python   Speed   matches

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!