我有一个将表连接到自身的查询。 结果包含重复的行(排序)。 此查询的目的是生成最常购买的产品列表。 考虑这个查询:
SELECT o1.ITEM ,o2.ITEM as ITEM2 ,o3.ITEM AS ITEM3 ,count(DISTINCT o1.ORDERNUM) as oCount FROM orders o1 INNER JOIN orders o2 ON o2.ORDERNUM = o1.ORDERNUM AND o2.ITEM != o1.ITEM LEFT OUTER JOIN orders o3 ON o3.ORDERNUM = o1.ORDERNUM AND o3.ITEM != o2.ITEM AND o3.ITEM != o1.ITEM GROUP BY o1.ITEM, o2.ITEM, o3.ITEM ORDER BY oCount DESC前12个结果:
+-------------+-------------+-------------+--------+ | ITEM | ITEM2 | ITEM3 | oCount | +-------------+-------------+-------------+--------+ | 02B13.04.GP | 77A04.10 | 45A04.04.GP | 54 | | 02B13.04.GP | 45A04.04.GP | 77A04.10 | 54 | | 77A04.10 | 45A04.04.GP | 02B13.04.GP | 54 | | 45A04.04.GP | 02B13.04.GP | 77A04.10 | 54 | | 77A04.10 | 02B13.04.GP | 45A04.04.GP | 54 | | 45A04.04.GP | 77A04.10 | 02B13.04.GP | 54 | | 57B01.01.GP | 57B01.11.GP | 57B01.10.GP | 12 | | 57B01.10.GP | 57B01.11.GP | 57B01.01.GP | 12 | | 57B01.01.GP | 57B01.10.GP | 57B01.11.GP | 12 | | 57B01.10.GP | 57B01.01.GP | 57B01.11.GP | 12 | | 57B01.11.GP | 57B01.10.GP | 57B01.01.GP | 12 | | 57B01.11.GP | 57B01.01.GP | 57B01.10.GP | 12 |请注意,前6个结果是相同的连接,顺序不同。 第二个6个结果具有相同的问题(整个结果都是如此)。 我的目标是为每个项目组创建一条记录,而不是每个项目组的每个组合都有一行。
我怎样才能避免这些重复的结果呢?
此外,对于这种查询的更有效方法的任何建议都是受欢迎的(我想添加一个额外的连接,但是有1,000,000个订单,资源需求已经失控)。
================================================
编辑:回答Darshan的问题
你可以分享表结构:
该表包含所有订单的行。 如果订单包含多个产品,则每个产品将有一行(给定订单的多行)。 此查询中唯一关注的列是:
ORDERNUM CHAR : Order Number ITEM CHAR : SKU for the item QTY INT : Quantity purchased ORDDATE DATETIME : Order Date结果返回:我需要的就是我在上面的结果示例中列出的内容。 目标是获得最常购买的产品列表。
I have a query that connects a table to itself. The results contain duplicate rows (sort of). The objective of this query is to produce a list of products most frequently purchased together. Consider this query:
SELECT o1.ITEM ,o2.ITEM as ITEM2 ,o3.ITEM AS ITEM3 ,count(DISTINCT o1.ORDERNUM) as oCount FROM orders o1 INNER JOIN orders o2 ON o2.ORDERNUM = o1.ORDERNUM AND o2.ITEM != o1.ITEM LEFT OUTER JOIN orders o3 ON o3.ORDERNUM = o1.ORDERNUM AND o3.ITEM != o2.ITEM AND o3.ITEM != o1.ITEM GROUP BY o1.ITEM, o2.ITEM, o3.ITEM ORDER BY oCount DESCAnd the first 12 results:
+-------------+-------------+-------------+--------+ | ITEM | ITEM2 | ITEM3 | oCount | +-------------+-------------+-------------+--------+ | 02B13.04.GP | 77A04.10 | 45A04.04.GP | 54 | | 02B13.04.GP | 45A04.04.GP | 77A04.10 | 54 | | 77A04.10 | 45A04.04.GP | 02B13.04.GP | 54 | | 45A04.04.GP | 02B13.04.GP | 77A04.10 | 54 | | 77A04.10 | 02B13.04.GP | 45A04.04.GP | 54 | | 45A04.04.GP | 77A04.10 | 02B13.04.GP | 54 | | 57B01.01.GP | 57B01.11.GP | 57B01.10.GP | 12 | | 57B01.10.GP | 57B01.11.GP | 57B01.01.GP | 12 | | 57B01.01.GP | 57B01.10.GP | 57B01.11.GP | 12 | | 57B01.10.GP | 57B01.01.GP | 57B01.11.GP | 12 | | 57B01.11.GP | 57B01.10.GP | 57B01.01.GP | 12 | | 57B01.11.GP | 57B01.01.GP | 57B01.10.GP | 12 |Note that the first 6 results are the same connections, in a different order. The second 6 results have the same issue (and this continues throughout the results). My goal is to have a single record for each item group, not a single row for each combination of each item group.
How can I avoid these repeated results?
Also any advice on a more efficient approach to this query would be welcome (I'd like to add an additional join, but with 1,000,000 orders the resource requirements are getting out of hand).
================================================
EDIT: To answer Darshan's questions
Can you share the table structure:
The table contains the lines for all the orders. If an order contains multiple products, there will be a line for each product (multiple lines for a given order). The only columns of concern in this query are:
ORDERNUM CHAR : Order Number ITEM CHAR : SKU for the item QTY INT : Quantity purchased ORDDATE DATETIME : Order DateResults returned: All I need is what I listed in the result sample above. The objective is to get a list of the products that are purchased together the most often.
最满意答案
你想要做的是消除重复的行,无论位置如何; 一招,因为你总是拥有所有项目组合,就是根据一个谓词item1 <item2 <item3来过滤结果
这是一个可能的解决方案:
SELECT a.item, b.item, c.item, count(*) from `orders` a left join orders b on a.ordernum = b.ordernum and a.item <> b.item left join orders c on a.ordernum = c.ordernum and a.item <> c.item and b.item <> c.item where a.item < b.item and b.item < c.item group by a.item, b.item, c.item order by count(*) descWhat you want to do is to eliminate duplicated rows regardless of the position; one trick, since you always have all the combinations of items is to filter the results according to a predicate that says item1 < item2 < item3
Here is a possible solution:
SELECT a.item, b.item, c.item, count(*) from `orders` a left join orders b on a.ordernum = b.ordernum and a.item <> b.item left join orders c on a.ordernum = c.ordernum and a.item <> c.item and b.item <> c.item where a.item < b.item and b.item < c.item group by a.item, b.item, c.item order by count(*) desc更多推荐
发布评论