Apache beam:以编程方式创建分区表

编程入门 行业动态 更新时间:2024-10-22 23:34:26
本文介绍了Apache beam:以编程方式创建分区表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时送ChatGPT账号..

我正在编写一个云数据流,它从 Pubsub 读取消息并将其存储到 BigQuery 中.我想使用分区表(按日期),并且我正在使用与消息关联的 Timestamp 来确定消息应该进入哪个分区.下面是我的代码:

I am writing a cloud dataflow that reads messages from Pubsub and stores those into BigQuery. I want to use partitioned table (by date) and I am using Timestamp associated with message to determine which partition the message should go into. Below is my code:

      BigQueryIO.writeTableRows()
        .to(new SerializableFunction<ValueInSingleWindow<TableRow>, TableDestination>() {
            private static final long serialVersionUID = 1L;

            @Override
              public TableDestination apply(ValueInSingleWindow<TableRow> value) {
                log.info("Row value : {}", value.getValue());
                Instant timestamp = value.getTimestamp();
                String partition = DateTimeFormat.forPattern("yyyyMMdd").print(timestamp);
                TableDestination td = new TableDestination(
                    "<project>:<dataset>.<table>" + "$" + partition, null);
                log.info("Table Destination : {}", td);
                return td;
              }
          })            
    .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)         
    .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND) 
    .withSchema(tableSchema);

当我部署数据流时,我可以在 Stackdriver 中看到日志语句,但是,消息没有插入到 BigQuery 表中,并且出现以下错误:

When I deploy the dataflow, I can see the log statements in Stackdriver, however, the messages do not get inserted into BigQuery tables and I get the following error:

Request failed with code 400, will NOT retry: https://www.googleapis/bigquery/v2/projects/<project_id>/datasets/<dataset_id>/tables
severity:  "WARNING"  

所以,看起来是无法创建表,导致插入失败.我是否需要更改数据流定义才能使其正常工作?如果没有,是否还有其他方式以编程方式创建分区表?

So, it looks like it is not able to create a table, resulting in insert failure. Do I need to change the dataflow definition in order to make this work? If not, is there any other way to create the partitioned tables programmatically?

我使用的是 Apache Beam 2.0.0.

I am using Apache beam 2.0.0.

推荐答案

这是 aBigQueryIO 中的错误,并已在 Beam 2.2 中修复.您可以使用 Beam 的快照版本,也可以等到 2.2 版完成(目前正在发布中).

This was a bug in BigQueryIO and it has been fixed in Beam 2.2. You can use a snapshot version of Beam, or wait until release 2.2 is finalized (the release process is currently in progress).

这篇关于Apache beam:以编程方式创建分区表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

更多推荐

[db:关键词]

本文发布于:2023-04-19 23:00:47,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/971381.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:分区表   方式   Apache   beam

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!