Vertica中每组最多N个查询(Top N per group query in Vertica)

这是一个老问题 - 在Vertica中寻找最佳解决方案。想象一下带有列的表格： -

A, B, C, D, E

列AD是整数或变量，列E是具有缺省值GETUTCDATE（）的timestamptz列。

表格的样本内容： -

1, 2, "AAA", 4, 1404305559 1, 2, "BBB", 23, 1404305633 1, 2, "CCC", 62, 1404305705 <-- the max entry for (1,2,"CCC") 1, 2, "AAA", 123, 1404305740 <-- the max entry for (1,2,"AAA") 1, 2, "BBB", 91, 1404305778 <-- the max entry for (1,2,"BBB")

因此，可能会出现复合（A，B，C）值的重复行（列D是值，列E是时间戳）。

我想要一个结果集，显示每个唯一（A，B，C）组合的最新行及其值。因此，上述结果将如下所示： -

1, 2, "CCC", 62, 1404305705 1, 2, "AAA", 123, 1404305740 1, 2, "BBB", 91, 1404305778

This is an old problem - looking for the best solution in Vertica. Imagine a table with columns:-

A, B, C, D, E

Columns A-D are ints or varchars and column E is a timestamptz column that has a default value of GETUTCDATE().

Sample content of the table:-

1, 2, "AAA", 4, 1404305559 1, 2, "BBB", 23, 1404305633 1, 2, "CCC", 62, 1404305705 <-- the max entry for (1,2,"CCC") 1, 2, "AAA", 123, 1404305740 <-- the max entry for (1,2,"AAA") 1, 2, "BBB", 91, 1404305778 <-- the max entry for (1,2,"BBB")

So potentially there are repeating rows for the composite (A,B,C) value (with column D being a value and column E the timestamp).

I'd like a resultset that showed, for each unique (A,B,C) combination, the latest row and its value. Hence the resultset for the above would look like:-

1, 2, "CCC", 62, 1404305705 1, 2, "AAA", 123, 1404305740 1, 2, "BBB", 91, 1404305778

最满意答案

我们来设置示例数据：

CREATE TABLE public.test ( A int, B int, C varchar, D int, E int ); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'AAA', 4, 1404305559); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'BBB', 23, 1404305633); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'CCC', 62, 1404305705); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'AAA', 123, 1404305740); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'BBB', 91, 1404305778); COMMIT;

我们将使用RANK函数根据A, B, C对每行进行排序A, B, C然后对E排序，并仅返回顶部的行（排名为1）。

SELECT a.a, a.b, a.c, a.d, a.e FROM (SELECT a, b, c, d, e, RANK() OVER ( PARTITION BY a, b, c ORDER BY e DESC) AS rank FROM public.test) a WHERE a.rank = 1;

这返回：

A | B | C | D | E ---+---+-----+-----+------------ 1 | 2 | CCC | 62 | 1404305705 1 | 2 | AAA | 123 | 1404305740 1 | 2 | BBB | 91 | 1404305778

Let's set up the sample data:

CREATE TABLE public.test ( A int, B int, C varchar, D int, E int ); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'AAA', 4, 1404305559); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'BBB', 23, 1404305633); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'CCC', 62, 1404305705); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'AAA', 123, 1404305740); INSERT INTO public.test (A, B, C, D, E) VALUES (1, 2, 'BBB', 91, 1404305778); COMMIT;

We'll use the RANK function to rank each row based on A, B, C and sort on E and return only the rows that are at the top (have a rank of 1).

SELECT a.a, a.b, a.c, a.d, a.e FROM (SELECT a, b, c, d, e, RANK() OVER ( PARTITION BY a, b, c ORDER BY e DESC) AS rank FROM public.test) a WHERE a.rank = 1;

This returns:

A | B | C | D | E ---+---+-----+-----+------------ 1 | 2 | CCC | 62 | 1404305705 1 | 2 | AAA | 123 | 1404305740 1 | 2 | BBB | 91 | 1404305778

更多推荐