如何计算流中前K个值的频率 ?
假设我们有一个流
CREATE STREAM stream ( value number );我们插入了十行
INSERT INTO stream (value) VALUES (1) INSERT INTO stream (value) VALUES (1) INSERT INTO stream (value) VALUES (1) INSERT INTO stream (value) VALUES (2) INSERT INTO stream (value) VALUES (2) INSERT INTO stream (value) VALUES (3) INSERT INTO stream (value) VALUES (4) INSERT INTO stream (value) VALUES (5) INSERT INTO stream (value) VALUES (6) INSERT INTO stream (value) VALUES (7)如何取回前2项及其频率 ?
value | frequency ----------------- 1 | 0.3 2 | 0.2我想它应该以某种方式同时使用Top K和Count-min Sketch?
How to calculate frequencies of top K values in the stream?
Let's say we have a stream
CREATE STREAM stream ( value number );And we inserted ten rows
INSERT INTO stream (value) VALUES (1) INSERT INTO stream (value) VALUES (1) INSERT INTO stream (value) VALUES (1) INSERT INTO stream (value) VALUES (2) INSERT INTO stream (value) VALUES (2) INSERT INTO stream (value) VALUES (3) INSERT INTO stream (value) VALUES (4) INSERT INTO stream (value) VALUES (5) INSERT INTO stream (value) VALUES (6) INSERT INTO stream (value) VALUES (7)How can I get back the top 2 items and their frequencies?
value | frequency ----------------- 1 | 0.3 2 | 0.2I suppose it should somehow use both Top K and the Count-min Sketch together?
最满意答案
您可以使用fss_agg :
CREATE CONTINUOUS VIEW v AS SELECT fss_agg(x, 10) AS top_10_x FROM some_stream这将跟踪前10最常出现的x值。 每个值的权重也可以明确给出:
CREATE CONTINUOUS VIEW v AS SELECT fss_agg_weighted(x, 10, y) AS top_10_x FROM some_stream第一个版本隐含使用权重1 。
您可以使用各种功能来读取前K值及其相关频率。 例如,以下将返回以下形式的元组:( (value, frequency) :
SELECT fss_topk(top_10_x) FROM vYou can use fss_agg for that:
CREATE CONTINUOUS VIEW v AS SELECT fss_agg(x, 10) AS top_10_x FROM some_streamThis will keep track of the top 10 most frequently occurring values of x. The weight given to each value can also be explicitly given:
CREATE CONTINUOUS VIEW v AS SELECT fss_agg_weighted(x, 10, y) AS top_10_x FROM some_streamThe first version implicitly uses a weight of 1.
There are various functions you can use to read the top-K values and their associated frequencies. For example, the following will return tuples of the form: (value, frequency):
SELECT fss_topk(top_10_x) FROM v更多推荐
发布评论