尝试使用Postgres DB中的PySpark读取表。我设置了以下代码并验证了SparkContext的存在:
Trying to read a table with PySpark from a Postgres DB. I have set up the following code and verified SparkContext exists:
import os os.environ['PYSPARK_SUBMIT_ARGS'] = '--driver-class-path /tmp/jars/postgresql-42.0.0.jar --jars /tmp/jars/postgresql-42.0.0.jar pyspark-shell' from pyspark import SparkContext, SparkConf conf = SparkConf() conf.setMaster("local[*]") conf.setAppName('pyspark') sc = SparkContext(conf=conf) from pyspark.sql import SQLContext properties = { "driver": "org.postgresql.Driver" } url = 'jdbc:postgresql://tom:@localhost/gqp' sqlContext = SQLContext(sc) sqlContext.read \ .format("jdbc") \ .option("url", url) \ .option("driver", properties["driver"]) \ .option("dbtable", "specimen") \ .load()我收到以下错误:
Py4JJavaError: An error occurred while calling o812.load. : java.lang.NullPointerException我的数据库名称为 gqp ,表为标本,并已使用Postgres.app macOS验证其是否在 localhost 上运行应用程序。
The name of my database is gqp, table is specimen, and have verified it is running on localhost using the Postgres.app macOS app.
推荐答案URL出了问题!
最初是: url ='jdbc:postgresql:// tom:@ localhost / gqp'
我删除了 tom:@ 部分,它起作用了。 URL必须遵循以下格式: jdbc:postgresql:// ip_address:port / db_name ,而mine是直接从Flask项目复制的。
I removed the tom:@ part, and it worked. The URL must follow the pattern: jdbc:postgresql://ip_address:port/db_name, whereas mine was directly copied from a Flask project.
如果您正在阅读本文,希望您不要犯同样的错误:)
If you're reading this, hope you didn't make this same mistake :)
更多推荐
PySpark sqlContext读取Postgres 9.6 NullPointerException
发布评论