如何在连续几天的“连续变化”中向行添加运行计数

编程入门行业动态更新时间:2024-10-27 14:33:19

本文介绍了如何在连续几天的“连续变化”中向行添加运行计数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

感谢 Mike ，建议您添加create / insert语句。

Thanks to Mike for the suggestion to add the create/insert statements.

create table test ( pid integer not null, date date not null, primary key (pid, date) ); insert into test values (1,'2014-10-1') , (1,'2014-10-2') , (1,'2014-10-3') , (1,'2014-10-5') , (1,'2014-10-7') , (2,'2014-10-1') , (2,'2014-10-2') , (2,'2014-10-3') , (2,'2014-10-5') , (2,'2014-10-7');

我想添加一个新列，即当前条纹天数，因此结果类似于：

I want to add a new column that is 'days in current streak' so the result would look like:

pid | date | in_streak -------|-----------|---------- 1 | 2014-10-1 | 1 1 | 2014-10-2 | 2 1 | 2014-10-3 | 3 1 | 2014-10-5 | 1 1 | 2014-10-7 | 1 2 | 2014-10-2 | 1 2 | 2014-10-3 | 2 2 | 2014-10-4 | 3 2 | 2014-10-6 | 1

我一直在尝试使用

PostgreSQL：查找直到现在为止连续的天数

PostgreSQL: find number of consecutive days up until now
Return rows of the latest 'streak' of data

但是我不知道如何在其他窗口函数中使用 dense_rank（）技巧来获得正确的结果。

but I can't work out how to use the dense_rank() trick with other window functions to get the right result.

推荐答案

在此表上构建（不使用 SQL关键字日期作为列名。）：

Building on this table (not using the SQL keyword "date" as column name.):

CREATE TABLE tbl( pid int , the_date date , PRIMARY KEY (pid, the_date) );

查询：

SELECT pid, the_date , row_number() OVER (PARTITION BY pid, grp ORDER BY the_date) AS in_streak FROM ( SELECT * , the_date - '2000-01-01'::date - row_number() OVER (PARTITION BY pid ORDER BY the_date) AS grp FROM tbl ) sub ORDER BY pid, the_date;

从另一个<$ c减去日期 $ c> date 产生整数。由于您一直在寻找连续的日子，因此每隔一行将增加一个。如果从中减去 row_number（），则每个<$ c $ g $ c $ c> pid 。这样就可以很容易地计算出每个组的数量。

Subtracting a date from another date yields an integer. Since you are looking for consecutive days, every next row would be greater by one. If we subtract row_number() from that, the whole streak ends up in the same group (grp) per pid. Then it's simple to deal out number per group.

grp 是用两次减法计算的，这应该是最快的。同样快速的替代方法可能是：

grp is calculated with two subtractions, which should be fastest. An equally fast alternative could be:

the_date - row_number() OVER (PARTITION BY pid ORDER BY the_date) * interval '1d' AS grp

一次乘法，一次减法。字符串连接和转换更昂贵。用 EXPLAIN ANALYZE 进行测试。

One multiplication, one subtraction. String concatenation and casting is more expensive. Test with EXPLAIN ANALYZE.

别忘了按 pid 另外在两个步骤中都是这样，否则您会无意间混合应该分开的组。

Don't forget to partition by pid additionally in both steps, or you'll inadvertently mix groups that should be separated.

使用子查询，因为那样通常比 CTE 更快。

Using a subquery, since that is typically faster than a CTE. There is nothing here that a plain subquery couldn't do.

由于您提到了它，所以没有什么了： dense_rank（）显然在这里不必要。基本 row_number（）做这份工作。

And since you mentioned it: dense_rank() is obviously not necessary here. Basic row_number() does the job.

更多推荐