从胖jar启动数据流作业时暂存程序包时出错

编程入门 行业动态 更新时间:2024-10-25 22:23:56
本文介绍了从胖jar启动数据流作业时暂存程序包时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我创建了一个maven项目来执行管道。如果我运行主类,管道运行完美。如果我创建一个胖jar并执行它,我有两个不同的错误,一个是我在Windows下执行它而另一个是如果我在Linux下执行它。

I create a maven project to execute a pipeline. If I run the main class, the pipeline works perfectly. If I create a fat jar and I execute it, I have two different errors, one if I execute it under Windows and another one if I execute it under Linux.

Under Windows:

Under Windows:

Exception in thread "main" java.lang.RuntimeException: Error while staging packages at org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:364) at org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:261) at org.apache.beam.runners.dataflow.util.GcsStager.stageFiles(GcsStager.java:66) at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:517) at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:170) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:289) at .... Caused by: java.nio.file.InvalidPathException: Illegal char <:> at index 2: gs://MY_BUCKET/staging at sun.nio.fs.WindowsPathParser.normalize(Unknown Source) at sun.nio.fs.WindowsPathParser.parse(Unknown Source) at sun.nio.fs.WindowsPathParser.parse(Unknown Source) at sun.nio.fs.WindowsPath.parse(Unknown Source) at sun.nio.fs.WindowsFileSystem.getPath(Unknown Source) at java.nio.file.Paths.get(Unknown Source) at org.apache.beam.sdk.io.LocalFileSystem.matchNewResource(LocalFileSystem.java:196) at org.apache.beam.sdk.io.LocalFileSystem.matchNewResource(LocalFileSystem.java:78) at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:563) at org.apache.beam.runners.dataflow.util.PackageUtil$PackageAttributes.forFileToStage(PackageUtil.java:452) at org.apache.beam.runners.dataflow.util.PackageUtil$1.call(PackageUtil.java:147) at org.apache.beam.runners.dataflow.util.PackageUtil$1.call(PackageUtil.java:138) at org.apache.beam.runners.dataflow.repackaged.googlemon.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) at org.apache.beam.runners.dataflow.repackaged.googlemon.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) at org.apache.beam.runners.dataflow.repackaged.googlemon.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

在Linux下:

Exception in thread "main" java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions) at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:233) at org.apache.beam.sdk.util.InstanceBuilder.build(InstanceBuilder.java:162) at org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:52) at org.apache.beam.sdk.Pipeline.create(Pipeline.java:142) at .... Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:222) ... 8 more Caused by: java.lang.IllegalArgumentException: Expected a valid 'gs://' path but was given '/home/USER/gs:/MY_BUCKET/temp/staging/' at org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator.getGcsPath(GcsPathValidator.java:101) at org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator.verifyPath(GcsPathValidator.java:75) at org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator.validateOutputFilePrefixSupported(GcsPathValidator.java:60) at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:237) ... 13 more Caused by: java.lang.IllegalArgumentException: Invalid GCS URI: /home/USER/gs:/MY_BUCKET/temp/staging/ at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.googlemon.base.Preconditions.checkArgument(Preconditions.java:191) at org.apache.beam.sdk.util.gcsfs.GcsPath.fromUri(GcsPath.java:116) at org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator.getGcsPath(GcsPathValidator.java:99) ... 16 more

这是我的pom.xml:

This is my pom.xml:

<project xmlns="maven.apache/POM/4.0.0" xmlns:xsi="www.w3/2001/XMLSchema-instance" xsi:schemaLocation="maven.apache/POM/4.0.0 maven.apache/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>xxxxxxxxxxx</groupId> <artifactId>xxxxxxxxx</artifactId> <version>0.0.1-SNAPSHOT</version> <dependencies> <!-- mvnrepository/artifact/com.google.cloud.dataflow/google-cloud-dataflow-java-sdk-all --> <dependency> <groupId>com.google.cloud.dataflow</groupId> <artifactId>google-cloud-dataflow-java-sdk-all</artifactId> <version>2.2.0</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-core</artifactId> <version>2.9.3</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.9.3</version> </dependency> <dependency> <groupId>com.google.appengine</groupId> <artifactId>appengine-api-1.0-sdk</artifactId> <version>1.9.60</version> </dependency> <dependency> <groupId>com.google.cloud</groupId> <artifactId>google-cloud-datastore</artifactId> <version>1.15.0</version> </dependency> <dependency> <groupId>javax.servlet</groupId> <artifactId>javax.servlet-api</artifactId> <version>4.0.0</version> </dependency> </dependencies> <build> <finalName>myFatJar</finalName> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>2.3.2</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.0.0</version> <configuration> <transformers> <transformer implementation= "org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>com.myclass.MyClass</mainClass> </transformer> </transformers> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> </execution> </executions> </plugin> </plugins> </build> </project>

这些是我的管道选项:

... DataflowPipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().create().as(DataflowPipelineOptions.class); //options.setGcpTempLocation("gs://MY_BUCKET/temp"); options.setTempLocation("gs://MY_BUCKET/temp"); options.setStagingLocation("gs://MY_BUCKET/staging"); options.setProject("xxxxxxxx"); options.setJobName("asd"); options.setRunner(DataflowRunner.class); Pipeline.create(options); ...

我试图用GcpTempLocation更改tempLocation但是,如果我这样做,我有这个错误:

I tried to change tempLocation with GcpTempLocation but, if I do, I have this error:

java.lang.IllegalArgumentException: BigQueryIO.Write needs a GCS temp location to store temp files. at com.googlemon.base.Preconditions.checkArgument(Preconditions.java:122) at org.apache.beam.sdk.io.gcp.bigquery.BatchLoads.validate(BatchLoads.java:191) at org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:621) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:651) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:655) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:655) at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311) at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245) at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:446) at org.apache.beam.sdk.Pipeline.validate(Pipeline.java:563) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:302) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:289) at ... at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282) at java.lang.Thread.run(Thread.java:748)

我该怎么办?

推荐答案

此评论解决了我的问题:

this comment resolves my question:

您是否明确尝试过将DataflowRunner的Apache Beam工件添加到pom.xml中? - 安德鲁

Did you try explicitly adding the Apache Beam artifact for DataflowRunner to pom.xml? – Andrew

更多推荐

从胖jar启动数据流作业时暂存程序包时出错

本文发布于:2023-11-24 08:48:35,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1624531.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:作业   数据流   程序包   jar

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!