Spark 2.1结构化流

编程入门行业动态更新时间:2024-10-18 03:31:40

本文介绍了Spark 2.1结构化流-使用Kakfa作为Python的源(pyspark)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

对于Apache Spark 2.1版，我想将Kafka(0.10.0.2.5)用作pyspark的结构化流的来源:

With Apache Spark version 2.1, I would like to use Kafka (0.10.0.2.5) as source for Structured Streaming with pyspark:

kafka_app.py:

from pyspark.sql import SparkSession spark=SparkSession.builder.appName("TestKakfa").getOrCreate() kafka=spark.readStream.format("kafka") \ .option("kafka.bootstrap.servers","localhost:6667") \ .option("subscribe","mytopic").load()

我通过以下方式启动了该应用程序:

I launched the app in the following way:

./bin/spark-submit kafka_app.py --master local[4] --jars spark-streaming-kafka-0-10-assembly_2.10-2.1.0.jar

从mvnrepository/artifact/org.apache.spark/spark-streaming-kafka-0-10-assembly_2.10/2.1.0下载.jar后

After having downloaded the .jar from mvnrepository/artifact/org.apache.spark/spark-streaming-kafka-0-10-assembly_2.10/2.1.0

我得到这样的错误:

[...] java.lang.ClassNotFoundException:Failed to find data source: kakfa. [...]

类似地，我无法运行与Kakfa集成的Spark示例: spark.apache/docs/2.1.0/structured-streaming-kafka-integration.html

Similarly, I cannot run the Spark example of integration with Kakfa : spark.apache/docs/2.1.0/structured-streaming-kafka-integration.html

所以我想知道我错在哪里或者实际上是否支持使用pyspark将Kafka与Spark 2.1集成，因为此页面仅提及Scala和Java作为版本0.10中受支持的语言，这使我产生疑问: spark.apache/docs/latest/streaming-kafka-integration.html (但是，如果还不支持，为什么要发布Python示例?)

So I wonder where I am wrong or whether Kafka integration with Spark 2.1 using pyspark is actually supported as this page mentioning only Scala and Java as supported language in the version 0.10 makes me doubt : spark.apache/docs/latest/streaming-kafka-integration.html (But if not supported yet, why an example in Python was published ?)

在此先感谢您的帮助！

推荐答案

您需要使用sql结构的流jar"spark-sql-kafka-0-10_2.11-2.1.0.jar"，而不是spark-streaming -kafka-0-10-assembly_2.10-2.1.0.jar.

You need to use sql-structured streaming jar "spark-sql-kafka-0-10_2.11-2.1.0.jar" instead of spark-streaming-kafka-0-10-assembly_2.10-2.1.0.jar.

更多推荐

Spark 2.1结构化流

本文发布于:2023-11-25 23:00:50，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1631689.html