BigQuery如何在时间范围内展平表的集合后进行分组(BigQuery how to group by after flattening a collection of tables over ti

编程入门 行业动态 更新时间:2024-10-23 13:26:37
BigQuery如何在时间范围内展平表的集合后进行分组(BigQuery how to group by after flattening a collection of tables over timerange)

我正在尝试执行以下操作:

使用FROM TABLE_DATE_RANGE在一个时间范围内组合表 FLATTEN那组数据 GROUP BY ColumnX SELECT ColumnX,SUM(ColumnY),SUM(ColumnZ)仅在唯一的ColumnX值上。

这是我的查询的要点:

SELECT r.ColumnX ,SUM(r.ColumnY) ,SUM(r.ColumnZ) FROM ( SELECT * FROM FLATTEN( ( SELECT ColumnX ,ColumnY ,ColumnZ FROM TABLE_DATE_RANGE(projectx.events_, TIMESTAMP('2015-09-01'), TIMESTAMP('2015-09-08'))), my_funky_object ) WHERE ColumnY > 10 ) r GROUP BY r.ColumnX

问题是,我得到的行数大于ColumnX的唯一值计数。 所以我退后一步,只是输出GROUP BY - COUNT的ColumnX进行调试,我得到以下输出!

我得到了看起来像中间结果的东西。

发生了什么,如何确保我的外部选择仅聚合ColumnX的唯一值?

I'm trying to do the following:

combine tables over a timerange using FROM TABLE_DATE_RANGE FLATTEN that set of data GROUP BY ColumnX SELECT ColumnX, SUM(ColumnY), SUM(ColumnZ) over only unique ColumnX values.

here's the gist of my query:

SELECT r.ColumnX ,SUM(r.ColumnY) ,SUM(r.ColumnZ) FROM ( SELECT * FROM FLATTEN( ( SELECT ColumnX ,ColumnY ,ColumnZ FROM TABLE_DATE_RANGE(projectx.events_, TIMESTAMP('2015-09-01'), TIMESTAMP('2015-09-08'))), my_funky_object ) WHERE ColumnY > 10 ) r GROUP BY r.ColumnX

The problem is, I get a number of rows WAY GREATER than the count of unique values of ColumnX should. So I took a step back and simply outputted the GROUP BY - COUNT of ColumnX in order to debug, and I get the following output!

and I get what looks like an intermediate result.

What is happening and how do I ensure that my outer select only aggregates over unique values of ColumnX?

最满意答案

您获得了ColumnX的每个不同值的计数,但您只显示计数,而不是值。

如果您的目标是准确计算不同值的数量,请尝试以下方法:

SELECT COUNT(*) ct FROM ( SELECT 1 FROM ... rest of your query ... GROUP BY r.ColumnX )

对于ColumnX的每个不同值,该内部查询将为您提供恰好一行(每个值为1)。 外部select语句将计算此类行的数量。

另一种方法是使用EXACT_COUNT_DISTINCT来获取精确的行数。 这比使用GROUP BY更简单但可扩展性更低。

You're getting the count of each distinct value of ColumnX, but you're only showing the count, not the value.

If your goal is to get an accurate count for the number of distinct values, try something like this:

SELECT COUNT(*) ct FROM ( SELECT 1 FROM ... rest of your query ... GROUP BY r.ColumnX )

That inner query will give you exactly one row (each with the value 1) for each distinct value of ColumnX. The outer select statement will count the number of such rows.

Another alternative is to use EXACT_COUNT_DISTINCT to get the exact count of rows. That's simpler but less scalable than using GROUP BY.

更多推荐

本文发布于:2023-07-31 20:42:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1347365.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:范围内   时间   如何在   展平表   BigQuery

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!