对于Apache Spark 2.1版,我想将Kafka(0.10.0.2.5)用作pyspark的结构化流的来源:
With Apache Spark version 2.1, I would like to use Kafka (0.10.0.2.5) as source for Structured Streaming with pyspark:
kafka_app.py:
kafka_app.py:
from pyspark.sql import SparkSession spark=SparkSession.builder.appName("TestKakfa").getOrCreate() kafka=spark.readStream.format("kafka") \ .option("kafka.bootstrap.servers","localhost:6667") \ .option("subscribe","mytopic").load()我通过以下方式启动了该应用程序:
I launched the app in the following way:
./bin/spark-submit kafka_app.py --master local[4] --jars spark-streaming-kafka-0-10-assembly_2.10-2.1.0.jar从mvnrepository/artifact/org.apache.spark/spark-streaming-kafka-0-10-assembly_2.10/2.1.0下载.jar后
After having downloaded the .jar from mvnrepository/artifact/org.apache.spark/spark-streaming-kafka-0-10-assembly_2.10/2.1.0
我得到这样的错误:
[...] java.lang.ClassNotFoundException:Failed to find data source: kakfa. [...]类似地,我无法运行与Kakfa集成的Spark示例: spark.apache/docs/2.1.0/structured-streaming-kafka-integration.html
Similarly, I cannot run the Spark example of integration with Kakfa : spark.apache/docs/2.1.0/structured-streaming-kafka-integration.html
所以我想知道我错在哪里或者实际上是否支持使用pyspark将Kafka与Spark 2.1集成,因为此页面仅提及Scala和Java作为版本0.10中受支持的语言,这使我产生疑问: spark.apache/docs/latest/streaming-kafka-integration.html (但是,如果还不支持,为什么要发布Python示例?)
So I wonder where I am wrong or whether Kafka integration with Spark 2.1 using pyspark is actually supported as this page mentioning only Scala and Java as supported language in the version 0.10 makes me doubt : spark.apache/docs/latest/streaming-kafka-integration.html (But if not supported yet, why an example in Python was published ?)
在此先感谢您的帮助!
推荐答案您需要使用sql结构的流jar"spark-sql-kafka-0-10_2.11-2.1.0.jar",而不是spark-streaming -kafka-0-10-assembly_2.10-2.1.0.jar.
You need to use sql-structured streaming jar "spark-sql-kafka-0-10_2.11-2.1.0.jar" instead of spark-streaming-kafka-0-10-assembly_2.10-2.1.0.jar.
更多推荐
Spark 2.1结构化流
发布评论