GROUP BY由间隔分隔的连续日期(GROUP BY consecutive dates delimited by gaps)

假设你（在Postgres 9.1中）有这样一个表格：

date | value

其中有一些差距（我的意思是：不是min（date）和max（date）之间的每个可能的日期都是行）。

我的问题是如何汇总这些数据，以便每个一致的组（没有间隙）被分开处理，如下所示：

min_date | max_date | [some aggregate of "value" column]

任何想法如何做到这一点？我相信这是可能的窗口功能，但经过一段时间尝试lag()和lead()我有点卡住了。

例如，如果数据是这样的：

date | value ---------------+------- 2011-10-31 | 2 2011-11-01 | 8 2011-11-02 | 10 2012-09-13 | 1 2012-09-14 | 4 2012-09-15 | 5 2012-09-16 | 20 2012-10-30 | 10

产出（ sum为总和）将是：

min | max | sum -----------+------------+------- 2011-10-31 | 2011-11-02 | 20 2012-09-13 | 2012-09-16 | 30 2012-10-30 | 2012-10-30 | 10

Assume you have (in Postgres 9.1 ) a table like this:

date | value

which have some gaps in it (I mean: not every possible date between min(date) and max(date) has it's row).

My problem is how to aggregate this data so that each consistent group (without gaps) is treated separately, like this:

min_date | max_date | [some aggregate of "value" column]

Any ideas how to do it? I believe it is possible with window functions but after a while trying with lag() and lead() I'm a little stuck.

For instance if the data are like this:

date | value ---------------+------- 2011-10-31 | 2 2011-11-01 | 8 2011-11-02 | 10 2012-09-13 | 1 2012-09-14 | 4 2012-09-15 | 5 2012-09-16 | 20 2012-10-30 | 10

the output (for sum as the aggregate) would be:

min | max | sum -----------+------------+------- 2011-10-31 | 2011-11-02 | 20 2012-09-13 | 2012-09-16 | 30 2012-10-30 | 2012-10-30 | 10

最满意答案

create table t ("date" date, "value" int); insert into t ("date", "value") values ('2011-10-31', 2), ('2011-11-01', 8), ('2011-11-02', 10), ('2012-09-13', 1), ('2012-09-14', 4), ('2012-09-15', 5), ('2012-09-16', 20), ('2012-10-30', 10);

更简单和更便宜的版本：

select min("date"), max("date"), sum(value) from ( select "date", value, "date" - (dense_rank() over(order by "date"))::int g from t ) s group by s.g order by 1

我的第一次尝试更复杂，更昂贵：

create temporary sequence s; select min("date"), max("date"), sum(value) from ( select "date", value, d, case when lag("date", 1, null) over(order by s.d) is null and "date" is not null then nextval('s') when lag("date", 1, null) over(order by s.d) is not null and "date" is not null then lastval() else 0 end g from t right join generate_series( (select min("date") from t)::date, (select max("date") from t)::date + 1, '1 day' ) s(d) on s.d::date = t."date" ) q where g != 0 group by g order by 1 ; drop sequence s;

输出：

min | max | sum ------------+------------+----- 2011-10-31 | 2011-11-02 | 20 2012-09-13 | 2012-09-16 | 30 2012-10-30 | 2012-10-30 | 10 (3 rows) create table t ("date" date, "value" int); insert into t ("date", "value") values ('2011-10-31', 2), ('2011-11-01', 8), ('2011-11-02', 10), ('2012-09-13', 1), ('2012-09-14', 4), ('2012-09-15', 5), ('2012-09-16', 20), ('2012-10-30', 10);

Simpler and cheaper version:

select min("date"), max("date"), sum(value) from ( select "date", value, "date" - (dense_rank() over(order by "date"))::int g from t ) s group by s.g order by 1

My first try was more complex and expensive:

create temporary sequence s; select min("date"), max("date"), sum(value) from ( select "date", value, d, case when lag("date", 1, null) over(order by s.d) is null and "date" is not null then nextval('s') when lag("date", 1, null) over(order by s.d) is not null and "date" is not null then lastval() else 0 end g from t right join generate_series( (select min("date") from t)::date, (select max("date") from t)::date + 1, '1 day' ) s(d) on s.d::date = t."date" ) q where g != 0 group by g order by 1 ; drop sequence s;

The output:

min | max | sum ------------+------------+----- 2011-10-31 | 2011-11-02 | 20 2012-09-13 | 2012-09-16 | 30 2012-10-30 | 2012-10-30 | 10 (3 rows)

更多推荐