当前的Postgresql版本(9.4)支持json和jsonb数据类型,如http://www.postgresql.org/docs/9.4/static/datatype-json.html中所述。
例如,可以通过SQL查询查询存储为jsonb的 JSON数据:
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"company": "Magnafone"}';作为Sparker用户,是否可以通过JDBC将此查询发送到Postgresql并将结果作为DataFrame接收?
到目前为止我尝试了什么:
val url = "jdbc:postgresql://localhost:5432/mydb?user=foo&password=bar" val df = sqlContext.load("jdbc", Map("url"->url,"dbtable"->"mydb", "driver"->"org.postgresql.Driver")) df.registerTempTable("table") sqlContext.sql("SELECT data->'myid' FROM table")但是sqlContext.sql()无法理解数据 - > SQL中的'myid'部分。
The current Postgresql version (9.4) supports json and jsonb data type as described in http://www.postgresql.org/docs/9.4/static/datatype-json.html
For instance, JSON data stored as jsonb can be queried via SQL query:
SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"company": "Magnafone"}';As a Sparker user, is it possible to send this query into Postgresql via JDBC and receive the result as DataFrame?
What I have tried so far:
val url = "jdbc:postgresql://localhost:5432/mydb?user=foo&password=bar" val df = sqlContext.load("jdbc", Map("url"->url,"dbtable"->"mydb", "driver"->"org.postgresql.Driver")) df.registerTempTable("table") sqlContext.sql("SELECT data->'myid' FROM table")But sqlContext.sql() was unable to understand the data->'myid' part in the SQL.
最满意答案
无法从Spark DataFrame API动态查询json / jsonb字段。 将数据提取到Spark后,它将转换为字符串,不再是可查询的结构(请参阅: SPARK-7869 )。
正如您已经发现的那样,您可以使用dbtable / table参数将子查询直接传递给源,并使用它来提取感兴趣的字段。 几乎相同的规则适用于任何非标准类型,调用存储过程或任何其他扩展。
It is not possible to query json / jsonb fields dynamically from Spark DataFrame API. Once data is fetched to Spark it is converted to string and is no longer a queryable structure (see: SPARK-7869).
As you've already discovered you can use dbtable / table arguments to pass a subquery directly to the source and use it to extract fields of interest. Pretty much the same rule applies to any non-standard type, calling stored procedures or any other extensions.
更多推荐
发布评论