我有一个简单的数据流测试工作,可以在apache-beam 2.1.0上成功运行,代码看起来像这样:
I have a simple dataflow job for testing that ran successfully with apache-beam 2.1.0, the code looks something like:
public static void main(String[] args) throws Exception { DataflowPipelineOptions dataflowOptions = PipelineOptionsFactory.as(DataflowPipelineOptions.class); dataflowOptions.setProject("MY_PROJECT_ID"); dataflowOptions.setStagingLocation("gs://MY_STAGING_LOC"); dataflowOptions.setTempLocation("gs://MY_TEMP_LOC"); dataflowOptions.setFilesToStage(Collections.singletonList("MY_LOCAL_JAR_FILE.jar")); dataflowOptions.setRunner(DataflowRunner.class); dataflowOptions.setNetwork("SOME_NETWORK"); dataflowOptions.setSubnetwork("regions/SOME_REGION/subnetworks/SOME_SUBNETWORK"); dataflowOptions.setZone("SOME_ZONE"); Pipeline p = Pipeline.create(dataflowOptions); List<String> LINES = Arrays.asList("foobar"); p.apply(Create.of(LINES)).setCoder(StringUtf8Coder.of()); p.run().waitUntilFinish(); }但是,当我迁移到apache-beam 2.4.0时,尝试通过cli提交数据流作业时,我立即收到以下错误消息.
However, when I migrate to apache-beam 2.4.0, I immediately get the following error when trying to submit a dataflow job via the cli.
Exception in thread "main" java.lang.RuntimeException: Error while staging packages at org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:396) at org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:273) at org.apache.beam.runners.dataflow.util.GcsStager.stageFiles(GcsStager.java:76) at org.apache.beam.runners.dataflow.util.GcsStager.stageDefaultFiles(GcsStager.java:64) at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:661) at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:174) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297) at company.app.App.main(App.java:48) Caused by: java.io.IOException: Error executing batch GCS request at org.apache.beam.sdk.util.GcsUtil.executeBatches(GcsUtil.java:607) at org.apache.beam.sdk.util.GcsUtil.getObjects(GcsUtil.java:339) at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.matchNonGlobs(GcsFileSystem.java:216) at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.match(GcsFileSystem.java:85) at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:123) at org.apache.beam.sdk.io.FileSystems.matchSingleFileSpec(FileSystems.java:188) at org.apache.beam.runners.dataflow.util.PackageUtil.alreadyStaged(PackageUtil.java:160) at org.apache.beam.runners.dataflow.util.PackageUtil.stagePackageSynchronously(PackageUtil.java:184) at org.apache.beam.runners.dataflow.util.PackageUtil.lambda$stagePackage$1(PackageUtil.java:174) at org.apache.beam.sdk.util.MoreFutures.lambda$supplyAsync$0(MoreFutures.java:101) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: com.google.api.client.http.HttpResponseException: 404 Not Found ...我没有更改任何配置设置.
I haven't changed any configuration settings.
进一步调试代码,对 www.googleapis/null
推荐答案这似乎是一个bug,已于2月13日在dev分支中得到修复.希望该修复很快发布:
Looks like it is a bug which was fixed in the dev branch on Feb 13. Hopefully the fix will be released soon:
原始问题: github/google/google-api-java-client/issues/1073
错误修复: github/google/google-api-java-client/pull/1087
更正后的修正: github/google/google-api-java-client/pull/1096
更多推荐
升级到Beam 2.4.0后,DataFlow Runner失败
发布评论