我有以下测试代码:
from pyspark import SparkContext, SQLContext sc = SparkContext('local') sqlContext = SQLContext(sc) print('Created spark context!') if __name__ == '__main__': df = sqlContext.read.format("jdbc").options( url="jdbc:mysql://localhost/mysql", driver="com.mysql.jdbc.Driver", dbtable="users", user="user", password="****", properties={"driver": 'com.mysql.jdbc.Driver'} ).load() print(df)运行它时,出现以下错误:
When I run it, I get the following error:
java.lang.ClassNotFoundException:com.mysql.jdbc.Driver
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
在Scala中,这是通过将.jar mysql-connector-java导入到项目中来解决的.
In Scala, this is solved by importing the .jar mysql-connector-java into the project.
但是,在python中,我不知道如何告诉pyspark模块链接mysql-connector文件.
However, in python I have no idea how to tell the pyspark module to link the mysql-connector file.
我看到这样的例子可以解决
I have seen this solved with examples like
spark --package=mysql-connector-java testfile.py但是我不希望这样,因为它迫使我以一种怪异的方式运行我的脚本.我想要一个全Python解决方案,或者将文件复制到某个地方,或者将一些内容添加到路径中.
But I don't want this since it forces me to run my script in a weird way. I would like an all python solution or copy a file somewhere or, add something to the Path.
推荐答案在初始化SparkConf之前创建sparkContext时,可以将参数传递给spark-submit:
You can pass arguments to spark-submit when creating your sparkContext before SparkConf is initialized:
import os from pyspark import SparkConf, SparkContext SUBMIT_ARGS = "--packages mysql:mysql-connector-java:5.1.39 pyspark-shell" os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS conf = SparkConf() sc = SparkContext(conf=conf),或者您可以将它们添加到您的$SPARK_HOME/conf/spark-defaults.conf
or you can add them to your $SPARK_HOME/conf/spark-defaults.conf
更多推荐
使用PySpark读取MySQL
发布评论