通过spark

编程入门 行业动态 更新时间:2024-10-22 21:18:11
本文介绍了通过spark-submit将其他罐子传递给Spark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我在MongoDB中使用了Spark,因此依赖于 mongo-hadoop 驱动程序。我得到的东西感谢输入我的原始问题这里 。

我的Spark工作正在运行,但是,我收到了我不明白的警告。当我运行这个命令的时候

$ SPARK_HOME / bin / spark-submit --driver-class-path / usr / local / share /mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT。 jar --jars /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo- hadoop-spark-1.5.0-SNAPSHOT.jar my_application.py

它的工作原理,但给我以下警告消息

警告:本地jar / usr / local / share / mongo-hadoop / build / libs / mongo- hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar 不存在,跳过。

当我试图完成这项工作时,如果在提交作业时忽略了这些路径,那么它根本无法运行。然而,现在,如果我忽略了这些路径,它会运行

$ SPARK_HOME / bin / spark-submit my_application.py

有人能解释一下这里发生了什么吗?我在这里查看了类似的问题,引用了相同的警告,并通过文档进行了搜索。

通过设置选项,它们将被存储为环境变量或其他东西?我很高兴它有效,但要小心,我不完全明白为什么有时而不是其他人。 code> JARS 应该用逗号分隔:

$ SPARK_HOME / bin / spark-submit \ --driver-class-path /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop /spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar \ --jars /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5。 0-SNAPSHOT.jar,/ usr / local / share / mongo-hadoop / spark / build / libs / mongo-hadoop-spark-1.5.0-SNAPSHOT.jar my_application.py

I'm using Spark with MongoDB, and consequently rely on the mongo-hadoop drivers. I got things working thanks to input on my original question here.

My Spark job is running, however, I receive warnings that I don't understand. When I run this command

$SPARK_HOME/bin/spark-submit --driver-class-path /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar --jars /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar my_application.py

it works, but gives me the following warning message

Warning: Local jar /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar does not exist, skipping.

When I was trying to get this working, if I left out those paths when submitting the job it wouldn't run at all. Now, however, if I leave out those paths it does run

$SPARK_HOME/bin/spark-submit my_application.py

Can someone please explain what is going on here? I have looked through similar questions here referencing the same warning, and searched through the documentation.

By setting the options once are they stored as environment variables or something? I'm glad it works, but wary that I don't fully understand why sometimes and not others.

解决方案

The problem is that CLASSPATH should be colon separated, while JARS should be comma separated:

$SPARK_HOME/bin/spark-submit \ --driver-class-path /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar \ --jars /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar,/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar my_application.py

更多推荐

通过spark

本文发布于:2023-11-24 09:06:46,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:spark

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!