我在表中有几个日期重复行:
ID STATE DATE ---------------------------- id01 connected 2015-04-04 id01 connected 2015-04-05 id01 connected 2015-04-08 id01 disconect 2015-04-11 id01 disconect 2015-04-12 id01 connected 2015-04-13我想要一个带有“开始日期”和“结束日期”的查询,结果如下:
ID STATE START DATE END DATE ---------------------------------------- id01 connected 2015-04-04 2015-04-10 id01 disconect 2015-04-11 2015-04-12 id01 connected 2015-04-13 XXXXXXXXXX最后一个“结束日期”并不重要(最后一个值,null,now()...)
最重要的是检测更改日期(在此示例中,2015-04-10没有行,2015-04-13中的状态相同)。
可行的解决方案? (无效)
SELECT ID, STATE, MIN(date), MAX(date) FROM TABLE GROUP BY ID, STATE;合并间隔无效:
ID STATE START DATE END DATE ---------------------------------------- id01 connected 2015-04-04 XXXXXXXXXX id01 disconect 2015-04-11 2015-04-12查询已在Impala中运行(类似SQL92)
I have repeat rows in a table with several dates:
ID STATE DATE ---------------------------- id01 connected 2015-04-04 id01 connected 2015-04-05 id01 connected 2015-04-08 id01 disconect 2015-04-11 id01 disconect 2015-04-12 id01 connected 2015-04-13I want a query with "start date" and "end date", with this result:
ID STATE START DATE END DATE ---------------------------------------- id01 connected 2015-04-04 2015-04-10 id01 disconect 2015-04-11 2015-04-12 id01 connected 2015-04-13 XXXXXXXXXXThe last "end date" it's not important (last value, null, now()...)
The most important is detect date of change (in this example no row for 2015-04-10, and the same state happends in 2015-04-13).
Posible solution? (not valid)
SELECT ID, STATE, MIN(date), MAX(date) FROM TABLE GROUP BY ID, STATE;Isn't valid because merge intervals:
ID STATE START DATE END DATE ---------------------------------------- id01 connected 2015-04-04 XXXXXXXXXX id01 disconect 2015-04-11 2015-04-12Query has run in Impala (similar SQL92)
最满意答案
Impala支持窗口功能。 这个问题是一个“缺口和孤岛”问题,因此可以使用行数的差异来解决:
select id, state, min(date) as start_date, max(date) as end_date from (select t.*, row_number() over (partition by id order by date) as seqnum_id, row_number() over (partition by id, state order by date) as seqnum_isd from table t ) t group by id, state, (seqnum_id - seqnum_isd);差异的逻辑并不困难,但是当你第一次学习它时却很棘手。 它有助于运行子查询并查看行号值是什么 - 以及差异定义每个组的原因。
Impala supports window functions. This problem is a "gap-and-islands" problem, so it can be solved using a difference of row numbers:
select id, state, min(date) as start_date, max(date) as end_date from (select t.*, row_number() over (partition by id order by date) as seqnum_id, row_number() over (partition by id, state order by date) as seqnum_isd from table t ) t group by id, state, (seqnum_id - seqnum_isd);The logic for the difference is not difficult, but tricky when you first learn it. It helps to run the subquery and see what the row number values are -- and why the difference defines each group.
更多推荐
发布评论