假设你(在Postgres 9.1中)有这样一个表格:
date | value其中有一些差距(我的意思是:不是min(date)和max(date)之间的每个可能的日期都是行)。
我的问题是如何汇总这些数据,以便每个一致的组(没有间隙)被分开处理,如下所示:
min_date | max_date | [some aggregate of "value" column]任何想法如何做到这一点? 我相信这是可能的窗口功能,但经过一段时间尝试lag()和lead()我有点卡住了。
例如,如果数据是这样的:
date | value ---------------+------- 2011-10-31 | 2 2011-11-01 | 8 2011-11-02 | 10 2012-09-13 | 1 2012-09-14 | 4 2012-09-15 | 5 2012-09-16 | 20 2012-10-30 | 10产出( sum为总和)将是:
min | max | sum -----------+------------+------- 2011-10-31 | 2011-11-02 | 20 2012-09-13 | 2012-09-16 | 30 2012-10-30 | 2012-10-30 | 10Assume you have (in Postgres 9.1 ) a table like this:
date | valuewhich have some gaps in it (I mean: not every possible date between min(date) and max(date) has it's row).
My problem is how to aggregate this data so that each consistent group (without gaps) is treated separately, like this:
min_date | max_date | [some aggregate of "value" column]Any ideas how to do it? I believe it is possible with window functions but after a while trying with lag() and lead() I'm a little stuck.
For instance if the data are like this:
date | value ---------------+------- 2011-10-31 | 2 2011-11-01 | 8 2011-11-02 | 10 2012-09-13 | 1 2012-09-14 | 4 2012-09-15 | 5 2012-09-16 | 20 2012-10-30 | 10the output (for sum as the aggregate) would be:
min | max | sum -----------+------------+------- 2011-10-31 | 2011-11-02 | 20 2012-09-13 | 2012-09-16 | 30 2012-10-30 | 2012-10-30 | 10最满意答案
create table t ("date" date, "value" int); insert into t ("date", "value") values ('2011-10-31', 2), ('2011-11-01', 8), ('2011-11-02', 10), ('2012-09-13', 1), ('2012-09-14', 4), ('2012-09-15', 5), ('2012-09-16', 20), ('2012-10-30', 10);更简单和更便宜的版本:
select min("date"), max("date"), sum(value) from ( select "date", value, "date" - (dense_rank() over(order by "date"))::int g from t ) s group by s.g order by 1我的第一次尝试更复杂,更昂贵:
create temporary sequence s; select min("date"), max("date"), sum(value) from ( select "date", value, d, case when lag("date", 1, null) over(order by s.d) is null and "date" is not null then nextval('s') when lag("date", 1, null) over(order by s.d) is not null and "date" is not null then lastval() else 0 end g from t right join generate_series( (select min("date") from t)::date, (select max("date") from t)::date + 1, '1 day' ) s(d) on s.d::date = t."date" ) q where g != 0 group by g order by 1 ; drop sequence s;输出:
min | max | sum ------------+------------+----- 2011-10-31 | 2011-11-02 | 20 2012-09-13 | 2012-09-16 | 30 2012-10-30 | 2012-10-30 | 10 (3 rows) create table t ("date" date, "value" int); insert into t ("date", "value") values ('2011-10-31', 2), ('2011-11-01', 8), ('2011-11-02', 10), ('2012-09-13', 1), ('2012-09-14', 4), ('2012-09-15', 5), ('2012-09-16', 20), ('2012-10-30', 10);Simpler and cheaper version:
select min("date"), max("date"), sum(value) from ( select "date", value, "date" - (dense_rank() over(order by "date"))::int g from t ) s group by s.g order by 1My first try was more complex and expensive:
create temporary sequence s; select min("date"), max("date"), sum(value) from ( select "date", value, d, case when lag("date", 1, null) over(order by s.d) is null and "date" is not null then nextval('s') when lag("date", 1, null) over(order by s.d) is not null and "date" is not null then lastval() else 0 end g from t right join generate_series( (select min("date") from t)::date, (select max("date") from t)::date + 1, '1 day' ) s(d) on s.d::date = t."date" ) q where g != 0 group by g order by 1 ; drop sequence s;The output:
min | max | sum ------------+------------+----- 2011-10-31 | 2011-11-02 | 20 2012-09-13 | 2012-09-16 | 30 2012-10-30 | 2012-10-30 | 10 (3 rows)更多推荐
发布评论