MySQL：将表连接到自身，消除重复的行(MySQL: Joining a table to itself, eliminating duplicate rows)

我有一个将表连接到自身的查询。结果包含重复的行（排序）。此查询的目的是生成最常购买的产品列表。考虑这个查询：

SELECT o1.ITEM ,o2.ITEM as ITEM2 ,o3.ITEM AS ITEM3 ,count(DISTINCT o1.ORDERNUM) as oCount FROM orders o1 INNER JOIN orders o2 ON o2.ORDERNUM = o1.ORDERNUM AND o2.ITEM != o1.ITEM LEFT OUTER JOIN orders o3 ON o3.ORDERNUM = o1.ORDERNUM AND o3.ITEM != o2.ITEM AND o3.ITEM != o1.ITEM GROUP BY o1.ITEM, o2.ITEM, o3.ITEM ORDER BY oCount DESC

前12个结果：

+-------------+-------------+-------------+--------+ | ITEM | ITEM2 | ITEM3 | oCount | +-------------+-------------+-------------+--------+ | 02B13.04.GP | 77A04.10 | 45A04.04.GP | 54 | | 02B13.04.GP | 45A04.04.GP | 77A04.10 | 54 | | 77A04.10 | 45A04.04.GP | 02B13.04.GP | 54 | | 45A04.04.GP | 02B13.04.GP | 77A04.10 | 54 | | 77A04.10 | 02B13.04.GP | 45A04.04.GP | 54 | | 45A04.04.GP | 77A04.10 | 02B13.04.GP | 54 | | 57B01.01.GP | 57B01.11.GP | 57B01.10.GP | 12 | | 57B01.10.GP | 57B01.11.GP | 57B01.01.GP | 12 | | 57B01.01.GP | 57B01.10.GP | 57B01.11.GP | 12 | | 57B01.10.GP | 57B01.01.GP | 57B01.11.GP | 12 | | 57B01.11.GP | 57B01.10.GP | 57B01.01.GP | 12 | | 57B01.11.GP | 57B01.01.GP | 57B01.10.GP | 12 |

请注意，前6个结果是相同的连接，顺序不同。第二个6个结果具有相同的问题（整个结果都是如此）。我的目标是为每个项目组创建一条记录，而不是每个项目组的每个组合都有一行。

我怎样才能避免这些重复的结果呢？

此外，对于这种查询的更有效方法的任何建议都是受欢迎的（我想添加一个额外的连接，但是有1,000,000个订单，资源需求已经失控）。

================================================

编辑：回答Darshan的问题

你可以分享表结构：

该表包含所有订单的行。如果订单包含多个产品，则每个产品将有一行（给定订单的多行）。此查询中唯一关注的列是：

ORDERNUM CHAR : Order Number ITEM CHAR : SKU for the item QTY INT : Quantity purchased ORDDATE DATETIME : Order Date

结果返回：我需要的就是我在上面的结果示例中列出的内容。目标是获得最常购买的产品列表。

I have a query that connects a table to itself. The results contain duplicate rows (sort of). The objective of this query is to produce a list of products most frequently purchased together. Consider this query:

SELECT o1.ITEM ,o2.ITEM as ITEM2 ,o3.ITEM AS ITEM3 ,count(DISTINCT o1.ORDERNUM) as oCount FROM orders o1 INNER JOIN orders o2 ON o2.ORDERNUM = o1.ORDERNUM AND o2.ITEM != o1.ITEM LEFT OUTER JOIN orders o3 ON o3.ORDERNUM = o1.ORDERNUM AND o3.ITEM != o2.ITEM AND o3.ITEM != o1.ITEM GROUP BY o1.ITEM, o2.ITEM, o3.ITEM ORDER BY oCount DESC

And the first 12 results:

+-------------+-------------+-------------+--------+ | ITEM | ITEM2 | ITEM3 | oCount | +-------------+-------------+-------------+--------+ | 02B13.04.GP | 77A04.10 | 45A04.04.GP | 54 | | 02B13.04.GP | 45A04.04.GP | 77A04.10 | 54 | | 77A04.10 | 45A04.04.GP | 02B13.04.GP | 54 | | 45A04.04.GP | 02B13.04.GP | 77A04.10 | 54 | | 77A04.10 | 02B13.04.GP | 45A04.04.GP | 54 | | 45A04.04.GP | 77A04.10 | 02B13.04.GP | 54 | | 57B01.01.GP | 57B01.11.GP | 57B01.10.GP | 12 | | 57B01.10.GP | 57B01.11.GP | 57B01.01.GP | 12 | | 57B01.01.GP | 57B01.10.GP | 57B01.11.GP | 12 | | 57B01.10.GP | 57B01.01.GP | 57B01.11.GP | 12 | | 57B01.11.GP | 57B01.10.GP | 57B01.01.GP | 12 | | 57B01.11.GP | 57B01.01.GP | 57B01.10.GP | 12 |

Note that the first 6 results are the same connections, in a different order. The second 6 results have the same issue (and this continues throughout the results). My goal is to have a single record for each item group, not a single row for each combination of each item group.

How can I avoid these repeated results?

Also any advice on a more efficient approach to this query would be welcome (I'd like to add an additional join, but with 1,000,000 orders the resource requirements are getting out of hand).

================================================

EDIT: To answer Darshan's questions

Can you share the table structure:

The table contains the lines for all the orders. If an order contains multiple products, there will be a line for each product (multiple lines for a given order). The only columns of concern in this query are:

ORDERNUM CHAR : Order Number ITEM CHAR : SKU for the item QTY INT : Quantity purchased ORDDATE DATETIME : Order Date

Results returned: All I need is what I listed in the result sample above. The objective is to get a list of the products that are purchased together the most often.

最满意答案

你想要做的是消除重复的行，无论位置如何; 一招，因为你总是拥有所有项目组合，就是根据一个谓词item1 <item2 <item3来过滤结果

这是一个可能的解决方案：

SELECT a.item, b.item, c.item, count(*) from `orders` a left join orders b on a.ordernum = b.ordernum and a.item <> b.item left join orders c on a.ordernum = c.ordernum and a.item <> c.item and b.item <> c.item where a.item < b.item and b.item < c.item group by a.item, b.item, c.item order by count(*) desc

What you want to do is to eliminate duplicated rows regardless of the position; one trick, since you always have all the combinations of items is to filter the results according to a predicate that says item1 < item2 < item3

Here is a possible solution:

SELECT a.item, b.item, c.item, count(*) from `orders` a left join orders b on a.ordernum = b.ordernum and a.item <> b.item left join orders c on a.ordernum = c.ordernum and a.item <> c.item and b.item <> c.item where a.item < b.item and b.item < c.item group by a.item, b.item, c.item order by count(*) desc

更多推荐

MySQL：将表连接到自身，消除重复的行(MySQL: Joining a table to itself, eliminating duplicate rows)

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表