PipelineDB,获取前K项的计数(PipelineDB, get counts for top K items)

编程入门 行业动态 更新时间:2024-10-25 16:29:29
PipelineDB,获取前K项的计数(PipelineDB, get counts for top K items)

如何计算流中前K个值的频率

假设我们有一个流

CREATE STREAM stream ( value number );

我们插入了十行

INSERT INTO stream (value) VALUES (1) INSERT INTO stream (value) VALUES (1) INSERT INTO stream (value) VALUES (1) INSERT INTO stream (value) VALUES (2) INSERT INTO stream (value) VALUES (2) INSERT INTO stream (value) VALUES (3) INSERT INTO stream (value) VALUES (4) INSERT INTO stream (value) VALUES (5) INSERT INTO stream (value) VALUES (6) INSERT INTO stream (value) VALUES (7)

如何取回前2项及其频率

value | frequency ----------------- 1 | 0.3 2 | 0.2

我想它应该以某种方式同时使用Top K和Count-min Sketch?

How to calculate frequencies of top K values in the stream?

Let's say we have a stream

CREATE STREAM stream ( value number );

And we inserted ten rows

INSERT INTO stream (value) VALUES (1) INSERT INTO stream (value) VALUES (1) INSERT INTO stream (value) VALUES (1) INSERT INTO stream (value) VALUES (2) INSERT INTO stream (value) VALUES (2) INSERT INTO stream (value) VALUES (3) INSERT INTO stream (value) VALUES (4) INSERT INTO stream (value) VALUES (5) INSERT INTO stream (value) VALUES (6) INSERT INTO stream (value) VALUES (7)

How can I get back the top 2 items and their frequencies?

value | frequency ----------------- 1 | 0.3 2 | 0.2

I suppose it should somehow use both Top K and the Count-min Sketch together?

最满意答案

您可以使用fss_agg :

CREATE CONTINUOUS VIEW v AS SELECT fss_agg(x, 10) AS top_10_x FROM some_stream

这将跟踪前10最常出现的x值。 每个值的权重也可以明确给出:

CREATE CONTINUOUS VIEW v AS SELECT fss_agg_weighted(x, 10, y) AS top_10_x FROM some_stream

第一个版本隐含使用权重1 。

您可以使用各种功能来读取前K值及其相关频率。 例如,以下将返回以下形式的元组:( (value, frequency) :

SELECT fss_topk(top_10_x) FROM v

You can use fss_agg for that:

CREATE CONTINUOUS VIEW v AS SELECT fss_agg(x, 10) AS top_10_x FROM some_stream

This will keep track of the top 10 most frequently occurring values of x. The weight given to each value can also be explicitly given:

CREATE CONTINUOUS VIEW v AS SELECT fss_agg_weighted(x, 10, y) AS top_10_x FROM some_stream

The first version implicitly uses a weight of 1.

There are various functions you can use to read the top-K values and their associated frequencies. For example, the following will return tuples of the form: (value, frequency):

SELECT fss_topk(top_10_x) FROM v

更多推荐

本文发布于:2023-08-02 04:31:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1368537.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:counts   PipelineDB   items   top

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!