从传递闭包表中找到最少的共同祖先(Finding Least Common Ancestor from a Transitive Closure Table)

我有一个表，表示组织层次结构的传递闭包（即，它是一个单根的树）：

create table ancestry ( ancestor integer, descendant integer, distance integer );

我有另一个表，其中包含允许每个用户访问的组织：

create table accessible ( user integer, organization integer );

系统向用户显示与用户可以访问的每个组织相关联的支出汇总。我总是可以向用户展示公司的视图（即根），向用户显示直接子组织的列表以及他的组织对总计的贡献。在大多数情况下，会有一个孩子，并且在看到多个孩子之前，用户需要向下钻取几个级别。我更愿意与第一个展示多个孩子的组织（即LCA）一起开始演示。

对于给定的用户，我可以很容易地找到到root的路径集，但是找不到最常见的祖先。我正在使用postgresql 9.1但更喜欢与数据库无关的解决方案。在最坏的情况下，我可以将路径拉回到应用程序的代码中并在那里计算LCA。

I have a table representing the transitive closure of an organizational hierarchy (i.e., its a tree with a single root):

create table ancestry ( ancestor integer, descendant integer, distance integer );

I have another table that contains the organizations that each user is allowed to access:

create table accessible ( user integer, organization integer );

The system shows the user a roll-up of expenditures associated with each organization the user can access. I could always start by showing the user a view of the company (i.e., the root) showing the user a list of immediate child organizations and how much his organizations contribute to the total. In most cases, there would be a single child and the user would be required to drill-down several levels before seeing multiple children. I would prefer to start the presentation with the first organization that shows multiple children (i.e., the LCA).

For a given user, I can find the set of paths to the root easy enough but am having trouble finding the least common ancestor. I am using postgresql 9.1 but would prefer a solution that is database agnostic. In the worst case, I can pull the paths to root back into the application's code and calculate the LCA there.

最满意答案

我重新审视了这个并开发了以下解决方案。我使用common-table-expression来更容易理解它的运行方式，但可以使用子查询轻松编写。

with hit (id, count) as ( select ancestry.ancestor ,count(ancestry.descendant) from accessible inner join ancestry on accessible.organization = ancestry.descendant where accessible.user = @user_id group by ancestry.ancestor ) select ancestry.descendant as lca from hit inner join ancestry on ancestry.descendant = hit.id and ancestry.ancestor = @company_id order by hit.count desc ,ancestry.distance desc limit 1 ;

对于层次结构中的每个组织，命中CTE计算从子项到遍历组织的根的路径数。然后，LCA是遍历最多的组织。如果出现平局，离根最远的组织（即最大（距离））是实际的LCA。用一个例子可以很好地说明这一点。

A | B / \ C D

假设我们希望从上面的树中找到节点C和D的LCA。命中CTE产生以下计数：

Node Count A 2 B 2 C 1 D 1

主查询添加距离：

Node Count Distance A 2 0 B 2 1 C 1 2 D 1 2

然后，主查询按降序计数和距离对结果进行排序

Node Count Distance B 2 1 A 2 0 C 1 2 D 1 2

LCA是列表中的第一项。

I took a fresh look at this and developed the following solution. I used a common-table-expression to make it easier to understand how it operates but it could easily be written using a sub-query.

with hit (id, count) as ( select ancestry.ancestor ,count(ancestry.descendant) from accessible inner join ancestry on accessible.organization = ancestry.descendant where accessible.user = @user_id group by ancestry.ancestor ) select ancestry.descendant as lca from hit inner join ancestry on ancestry.descendant = hit.id and ancestry.ancestor = @company_id order by hit.count desc ,ancestry.distance desc limit 1 ;

The hit CTE counts, for each organization in the hierarchy, the number of paths from a child to the root that traverse the organization. The LCA is then the organization with the most traversals. In the event of a tie, the organization farthest from the root (i.e., max(distance)) is the actual LCA. This is best illustrated with an example.

A | B / \ C D

Assuming we wish to find the LCA of nodes C and D from the tree above. The hit CTE produces the following counts:

Node Count A 2 B 2 C 1 D 1

The main query adds the distance:

Node Count Distance A 2 0 B 2 1 C 1 2 D 1 2

The main query then orders the results by descending count and distance

Node Count Distance B 2 1 A 2 0 C 1 2 D 1 2

The LCA is the first item in the list.

更多推荐