数据流模板是否支持BigQuery接收器选项的模板输入?

编程入门行业动态更新时间:2024-10-25 00:32:29

本文介绍了数据流模板是否支持BigQuery接收器选项的模板输入?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

由于我正在运行一个正常工作的静态数据流，因此我想以此模板创建一个模板，以使我可以轻松地重用数据流，而无需键入任何命令行.

As I have a working static Dataflow running, I'd like to create a template from this one to let me easily reuse the Dataflow without any command line typing.

按照官方的创建模板教程进行操作提供可模板输出的样本.

Following the Creating Templates tutorial from the official doesn't provide a sample for templatable output.

我的数据流"以BigQuery接收器结尾，该接收器接受一些参数(例如目标表)进行存储.这个确切的参数是我想在模板中使用的参数，允许我在运行流程之后选择目标存储.

My Dataflow ends with a BigQuery sink which takes a few arguments like the target table for storage. This exact parameter is the one I'd like to make available in my template allowing me to choose the target storage after running the flow.

但是，我无法正常工作.在下面粘贴一些代码片段，这些片段可以帮助解释我遇到的确切问题.

But, I'm not able to get this working. Below I paste some code snippets which could help explaining the exact issue I have.

class CustomOptions(PipelineOptions): @classmethod def _add_argparse_args(cls, parser): parser.add_value_provider_argument( '--input', default='gs://my-source-bucket/file.json') parser.add_value_provider_argument( '--table', default='my-project-id:some-dataset.some-table') pipeline_options = PipelineOptions() pipe = beam.Pipeline(options=pipeline_options) custom_options = pipeline_options.view_as(CustomOptions)

(...)

# store processed_pipe | beam.io.Write(BigQuerySink( table=custom_options.table.get(), schema='a_column:STRING,b_column:STRING,etc_column:STRING', create_disposition=BigQueryDisposition.CREATE_IF_NEEDED, write_disposition=BigQueryDisposition.WRITE_APPEND ))

在创建模板时，我没有提供任何参数.一秒钟后，我收到以下错误消息:

When creating the template, I did not give any parameters with it. In a split second I get the following error message:

apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: table, type: str, default_value: 'my-project-id:some-dataset.some-table').get() not called from a runtime context

当我在创建模板时添加--table参数时，正在创建模板，但是--table参数值随后在模板中进行了硬编码，以后不会被任何给定的模板值覆盖，以供table使用.

When I add a --table parameter at template creation, the template is being created but the --table parameter value is then hardcoded in the template and not overridden by any given template value for table later.

将table=custom_options.table.get(),替换为table=StaticValueProvider(str, custom_options.table.get())时，我得到相同的错误.

I get the same error when I replaced the table=custom_options.table.get(), with table=StaticValueProvider(str, custom_options.table.get()).

是否有人已经使用可自定义的BigQuerySink参数构建了可模板化的数据流?我很想得到一些提示.

Is there someone who already built a templatable Dataflow with customisable BigQuerySink parameters? I'd love to get some hints on this.

推荐答案

Python当前仅支持FileBasedSource IO的ValueProvider选项.通过在您提到的链接上单击"Python"选项卡，可以看到以下内容: cloud.google/dataflow/docs/templates/creating-模板

Python currently only supports ValueProvider options for FileBasedSource IOs. You can see that by clicking on the Python tab at the link you mentioned: cloud.google/dataflow/docs/templates/creating-templates

管道I/O和运行时参数"部分.

under the "Pipeline I/O and runtime parameters" section.

与Java中发生的情况不同，Python中的BigQuery不使用自定义源.换句话说，它不是在SDK中完全实现的，而是在后端中包含部分内容(因此它是本地源").只有自定义来源可以使用模板.计划将BigQuery添加为自定义来源:issue.apache/jira/browse/BEAM-1440

Unlike what happens in Java, BigQuery in Python does not use a custom source. In other words, it is not fully implemented in the SDK but also contains parts in the backend (and it is therefore a "native source"). Only custom sources can use templates. There are plans to have BigQuery added as custom source: issues.apache/jira/browse/BEAM-1440