如何更有效地递归搜索最长节点？(How can I make a recursive search for longest node more efficient?)

我试图在有向非循环图中找到最长的路径。目前，我的代码似乎运行时间复杂度为O（n 3 ） 。

该图是输入{0: [1,2], 1: [2,3], 3: [4,5] }

#Input: dictionary: graph, int: start, list: path #Output: List: the longest path in the graph (Recurrance) # This is a modification of a depth first search def find_longest_path(graph, start, path=[]): path = path + [start] paths = path for node in graph[start]: if node not in path: newpaths = find_longest_path(graph, node, path) #Only take the new path if its length is greater than the current path if(len(newpaths) > len(paths)): paths = newpaths return paths

它返回表单中的节点列表，例如[0,1,3,5]

如何使这比O（n 3 ）更有效？递归是解决这个问题的正确方法，还是应该使用不同的循环？

I'm trying to find the longest path in a Directed Acyclic graph. At the moment, my code seems to be running time complexity of O(n3).

The graph is of input {0: [1,2], 1: [2,3], 3: [4,5] }

#Input: dictionary: graph, int: start, list: path #Output: List: the longest path in the graph (Recurrance) # This is a modification of a depth first search def find_longest_path(graph, start, path=[]): path = path + [start] paths = path for node in graph[start]: if node not in path: newpaths = find_longest_path(graph, node, path) #Only take the new path if its length is greater than the current path if(len(newpaths) > len(paths)): paths = newpaths return paths

It returns a list of nodes in the form e.g. [0,1,3,5]

How can I make this more efficient than O(n3)? Is recursion the right way to solve this or should I be using a different loop?

最满意答案

您可以在O（n + e）中解决此问题（即节点数+线的线性）。

这个想法是你首先创建一个拓扑排序（我是Tarjan算法的粉丝）和反向边集。如果您可以分解问题以利用现有解决方案，这总是有帮助的。

然后，您向后走拓扑排序，向子节点推送其子节距+ 1（如果有多条路径，则保持最大值）。跟踪到目前为止看到的最大距离的节点。

当您完成带有距离的所有节点的注释后，您可以从具有最长距离的节点开始，该节点将是您最长的路径根，然后选择与当前节点相比正好一个计数的子节点向下走图表（因为他们躺在关键路径上）。

通常，当试图找到最优复杂度算法时，不要害怕一个接一个地运行多个阶段。从复杂度的角度来看，顺序运行的五个O（n）算法仍然是O（n）并且仍然优于O（n ² ） （尽管实际运行时间可能更差，这取决于恒定成本/因子和n的大小）。

ETA：我刚注意到你有一个起始节点。这使得它只是进行深度优先搜索并保持迄今为止看到的最长解决方案的情况，无论如何都只是O（n + e） 。递归很好，或者你可以保留一个列表/堆栈的访问节点（每次回溯时你都要小心找到下一个孩子）。

当您从深度优先搜索回溯时，您需要存储从该节点到叶子的最长路径，这样您就不会重新处理任何子树。这也将作为visited标志（即除了执行node not in path测试中的node not in subpath_cache还有一个node not in subpath_cache之前node not in subpath_cache测试中）。您可以存储长度，然后根据上面讨论的顺序值（关键路径）重建路径，而不是存储子路径。

ETA2：这是一个解决方案。

def find_longest_path_rec(graph, parent, cache): maxlen = 0 for node in graph[parent]: if node in cache: pass elif node not in graph: cache[node] = 1 else: cache[node] = find_longest_path_rec(graph, node, cache) maxlen = max(maxlen, cache[node]) return maxlen + 1 def find_longest_path(graph, start): cache = {} maxlen = find_longest_path_rec(graph, start, cache) path = [start] for i in range(maxlen-1, 0, -1): for node in graph[path[-1]]: if cache[node] == i: path.append(node) break else: assert(0) return path

请注意，我已删除了node not in path测试中的node not in path因为我假设您实际上正在提供所声称的DAG。如果你想要那个检查，你应该提出错误而不是忽略它。另请注意，我已将断言添加到for的else子句中for以记录我们必须始终在路径中找到有效的下一个（顺序）节点。

ETA3：最终for循环有点令人困惑。我们正在做的是考虑在关键路径中所有节点距离必须是连续的。考虑节点0是距离4，节点1是距离3，节点2是距离1.如果我们的路径开始[0, 2, ...]我们就会产生矛盾，因为节点0离叶子的距离不是2。

You can solve this problem in O(n+e) (i.e. linear in the number of nodes + edges).

The idea is that you first create a topological sort (I'm a fan of Tarjan's algorithm) and the set of reverse edges. It always helps if you can decompose your problem to leverage existing solutions.

You then walk the topological sort backwards pushing to each parent node its child's distance + 1 (keeping maximums in case there are multiple paths). Keep track of the node with the largest distance seen so far.

When you have finished annotating all of the nodes with distances you can just start at the node with the largest distance which will be your longest path root, and then walk down your graph choosing the children that are exactly one count less than the current node (since they lie on the critical path).

In general, when trying to find an optimal complexity algorithm don't be afraid to run multiple stages one after the other. Five O(n) algorithms run sequentially is still O(n) and is still better than O(n²) from a complexity perspective (although it may be worse real running time depending on the constant costs/factors and the size of n).

ETA: I just noticed you have a start node. This makes it simply a case of doing a depth first search and keeping the longest solution seen so far which is just O(n+e) anyway. Recursion is fine or you can keep a list/stack of visited nodes (you have to be careful when finding the next child each time you backtrack).

As you backtrack from your depth first search you need to store the longest path from that node to a leaf so that you don't re-process any sub-trees. This will also serve as a visited flag (i.e. in addition to doing the node not in path test also have a node not in subpath_cache test before recursing). Instead of storing the subpath you could store the length and then rebuild the path once you're finished based on sequential values as discussed above (critical path).

ETA2: Here's a solution.

def find_longest_path_rec(graph, parent, cache): maxlen = 0 for node in graph[parent]: if node in cache: pass elif node not in graph: cache[node] = 1 else: cache[node] = find_longest_path_rec(graph, node, cache) maxlen = max(maxlen, cache[node]) return maxlen + 1 def find_longest_path(graph, start): cache = {} maxlen = find_longest_path_rec(graph, start, cache) path = [start] for i in range(maxlen-1, 0, -1): for node in graph[path[-1]]: if cache[node] == i: path.append(node) break else: assert(0) return path

Note that I've removed the node not in path test because I'm assuming that you're actually supplying a DAG as claimed. If you want that check you should really be raising an error rather than ignoring it. Also note that I've added the assertion to the else clause of the for to document that we must always find a valid next (sequential) node in the path.

ETA3: The final for loop is a little confusing. What we're doing is considering that in the critical path all of the node distances must be sequential. Consider node 0 is distance 4, node 1 is distance 3 and node 2 is distance 1. If our path started [0, 2, ...] we have a contradiction because node 0 is not 1 further from a leaf than 2.

更多推荐