Vertica中每组最多N个查询(Top N per group query in Vertica)

编程入门 行业动态 更新时间:2024-10-25 10:31:42
Vertica中每组最多N个查询(Top N per group query in Vertica)

这是一个老问题 - 在Vertica中寻找最佳解决方案。 想象一下带有列的表格: -

A, B, C, D, E

列AD是整数或变量,列E是具有缺省值GETUTCDATE()的timestamptz列。

表格的样本内容: -

1, 2, "AAA", 4, 1404305559 1, 2, "BBB", 23, 1404305633 1, 2, "CCC", 62, 1404305705 <-- the max entry for (1,2,"CCC") 1, 2, "AAA", 123, 1404305740 <-- the max entry for (1,2,"AAA") 1, 2, "BBB", 91, 1404305778 <-- the max entry for (1,2,"BBB")

因此,可能会出现复合(A,B,C)值的重复行(列D是值,列E是时间戳)。

我想要一个结果集,显示每个唯一(A,B,C)组合的最新行及其值。 因此,上述结果将如下所示: -

1, 2, "CCC", 62, 1404305705 1, 2, "AAA", 123, 1404305740 1, 2, "BBB", 91, 1404305778

This is an old problem - looking for the best solution in Vertica. Imagine a table with columns:-

A, B, C, D, E

Columns A-D are ints or varchars and column E is a timestamptz column that has a default value of GETUTCDATE().

Sample content of the table:-

1, 2, "AAA", 4, 1404305559 1, 2, "BBB", 23, 1404305633 1, 2, "CCC", 62, 1404305705 <-- the max entry for (1,2,"CCC") 1, 2, "AAA", 123, 1404305740 <-- the max entry for (1,2,"AAA") 1, 2, "BBB", 91, 1404305778 <-- the max entry for (1,2,"BBB")

So potentially there are repeating rows for the composite (A,B,C) value (with column D being a value and column E the timestamp).

I'd like a resultset that showed, for each unique (A,B,C) combination, the latest row and its value. Hence the resultset for the above would look like:-

1, 2, "CCC", 62, 1404305705 1, 2, "AAA", 123, 1404305740 1, 2, "BBB", 91, 1404305778

最满意答案

我们来设置示例数据:

CREATE TABLE public.test ( A int, B int, C varchar, D int, E int ); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'AAA', 4, 1404305559); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'BBB', 23, 1404305633); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'CCC', 62, 1404305705); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'AAA', 123, 1404305740); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'BBB', 91, 1404305778); COMMIT;

我们将使用RANK函数根据A, B, C对每行进行排序A, B, C然后对E排序,并仅返回顶部的行(排名为1)。

SELECT a.a, a.b, a.c, a.d, a.e FROM (SELECT a, b, c, d, e, RANK() OVER ( PARTITION BY a, b, c ORDER BY e DESC) AS rank FROM public.test) a WHERE a.rank = 1;

这返回:

A | B | C | D | E ---+---+-----+-----+------------ 1 | 2 | CCC | 62 | 1404305705 1 | 2 | AAA | 123 | 1404305740 1 | 2 | BBB | 91 | 1404305778

Let's set up the sample data:

CREATE TABLE public.test ( A int, B int, C varchar, D int, E int ); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'AAA', 4, 1404305559); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'BBB', 23, 1404305633); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'CCC', 62, 1404305705); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'AAA', 123, 1404305740); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'BBB', 91, 1404305778); COMMIT;

We'll use the RANK function to rank each row based on A, B, C and sort on E and return only the rows that are at the top (have a rank of 1).

SELECT a.a, a.b, a.c, a.d, a.e FROM (SELECT a, b, c, d, e, RANK() OVER ( PARTITION BY a, b, c ORDER BY e DESC) AS rank FROM public.test) a WHERE a.rank = 1;

This returns:

A | B | C | D | E ---+---+-----+-----+------------ 1 | 2 | CCC | 62 | 1404305705 1 | 2 | AAA | 123 | 1404305740 1 | 2 | BBB | 91 | 1404305778

更多推荐

本文发布于:2023-08-04 14:17:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1416112.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:最多   每组   Vertica   query   group

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!