DISTINCT与PARTITION BY与GROUPBY

编程入门 行业动态 更新时间:2024-10-16 02:28:20
本文介绍了DISTINCT与PARTITION BY与GROUPBY的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

SELECT DISTINCT 公司,我已经在应用程序中发现了一些SQL查询,仓库,物料, SUM(数量)OVER(分配到公司,仓库,物料)AS库存

SELECT 公司,仓库,项目, SUM(数量)AS库存 GROUP BY公司,仓库,物料

解决方案使用第一种方法是否有任何好处(性能,可读性,编写查询的额外灵活性,可维护性等) div> 表现:

优胜者: GROUP BY

在一个包含无索引列的大表上进行一些非常基本的测试表明,至少在我的情况下,这两个查询生成了完全不同的查询计划。用于 PARTITION BY 的那个显着较慢。 GROUP BY 查询计划仅包含表扫描和聚合操作,而 PARTITION BY plan有两个嵌套循环自连接。第二次运行时, PARTITION BY 大约耗时2800毫秒, GROUP BY 只花费了500毫秒。

可读性/可维护性:

优胜者: GROUP BY 灵活性

优胜者: PARTITION BY

PARTITION BY 选择分组列。使用 GROUP BY ,所有聚合列只能有一组分组列。使用 DISTINCT + PARTITION BY ,您可以在每个分区中有不同的列。同样在一些DBMS中,您可以从 OVER 子句中的更多聚合/分析函数中进行选择。

I have found some SQL queries in an application I am examining like this:

SELECT DISTINCT Company, Warehouse, Item, SUM(quantity) OVER (PARTITION BY Company, Warehouse, Item) AS stock

I'm quite sure this gives the same result as:

SELECT Company, Warehouse, Item, SUM(quantity) AS stock GROUP BY Company, Warehouse, Item

Is there any benefit (performance, readability, additional flexibility in writing the query, maintainability, etc.) of using the first approach over the later?

解决方案

Performance:

Winner: GROUP BY

Some very rudimentary testing on a large table with unindexed columns showed that at least in my case the two queries generated a completely different query plan. The one for PARTITION BY was significantly slower.

The GROUP BY query plan included only a table scan and aggregation operation while the PARTITION BY plan had two nested loop self-joins. The PARTITION BY took about 2800ms on the second run, the GROUP BY took only 500ms.

Readability / Maintainability:

Winner: GROUP BY

Based on the opinions of the commenters here the PARTITION BY is less readable for most developers so it will be probably also harder to maintain in the future.

Flexibility

Winner: PARTITION BY

PARTITION BY gives you more flexibility in choosing the grouping columns. With GROUP BY you can have only one set of grouping columns for all aggregated columns. With DISTINCT + PARTITION BY you can have different column in each partition. Also on some DBMSs you can chose from more aggregation/analytic functions in the OVER clause.

更多推荐

DISTINCT与PARTITION BY与GROUPBY

本文发布于:2023-10-24 21:18:11,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:DISTINCT   PARTITION   GROUPBY

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!