我正在使用 group_concat / string_agg (可能是varchar),并且希望确保bigquery不会删除任何数据连接。
解决方案如果特定查询的内存不足,BigQuery不会丢弃数据;你会得到一个错误。你应该尽量保持你的行大小低于100MB,因为除此之外你会开始犯错。您可以尝试使用如下示例创建大型字符串:
#standardSQL SELECT STRING_AGG(单词)AS单词FROM`bigquery-public-data.samples.shakespeare`;此表中有164,656行,此查询创建一个字符串,其中包含1,168,286个字符(大约一兆字节在尺寸方面)。不过,如果您在单个执行节点上运行的查询需要的数量超过几百兆,那么您将开始看到一个错误:
<$ p $ (CONCAT(word,corpus))作为单词从`bigquery-public-data.samples.shakespeare` CROSS JOIN UNNEST( GENERATE_ARRAY(1,1000));这会导致错误:
查询执行过程中超出资源。如果您单击UI中的解释选项卡,可以看到失败发生在阶段1,同时构建 STRING_AGG 的结果。在这种情况下,字符串的长度应该是3,303,599,000个字符,或者大小约为3.3 GB。
I am using group_concat/string_agg (possibly varchar) and want to ensure that bigquery won't drop any of the data concatenated.
解决方案BigQuery will not drop data if a particular query runs out of memory; you will get an error instead. You should try to keep your row sizes below ~100MB, since beyond that you'll start getting errors. You can try creating a large string with an example like this:
#standardSQL SELECT STRING_AGG(word) AS words FROM `bigquery-public-data.samples.shakespeare`;There are 164,656 rows in this table, and this query creates a string with 1,168,286 characters (around a megabyte in size). You'll start to see an error if you run a query that requires more than something on the order of hundreds of megabytes on a single node of execution, though:
#standardSQL SELECT STRING_AGG(CONCAT(word, corpus)) AS words FROM `bigquery-public-data.samples.shakespeare` CROSS JOIN UNNEST(GENERATE_ARRAY(1, 1000));This results in an error:
Resources exceeded during query execution.If you click on the "Explanation" tab in the UI, you can see that the failure happened during stage 1 while building the results of STRING_AGG. In this case, the string would have been 3,303,599,000 characters long, or approximately 3.3 GB in size.
更多推荐
bigquery输出中的group
发布评论