我有以下Java代码从HDFS读取JSON文件并使用Spark将其输出为HIVE视图。
package org.apache.spark.examples.sql.hive; import java.io.File; import java.io.Serializable; import java.util.ArrayList; import java.util.List; import org.apache.spark.api.java.function.MapFunction; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Encoders; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; // $example off:spark_hive$ public class JavaSparkHiveExample { public static void main(String[] args) { // $example on:spark_hive$ SparkSession spark = SparkSession .builder() .appName("Java Spark Hive Example") .master("local[*]") .config("hive.metastore.uris", "thrift://localhost:9083") .enableHiveSupport() .getOrCreate(); Dataset<Row> jsonTest = spark.read().json("/tmp/testJSON.json"); jsonTest.createOrReplaceTempView("jsonTest"); Dataset<Row> showAll = spark.sql("SELECT * FROM jsonTest"); showAll.show(); spark.stop(); } }为了将JSON插入到表test1中(例如),我有一个难以将视图重新映射到INSERT语句的问题。
非常感谢帮助!
I have the following Java code that read a JSON file from HDFS and output it as a HIVE view using Spark.
package org.apache.spark.examples.sql.hive; import java.io.File; import java.io.Serializable; import java.util.ArrayList; import java.util.List; import org.apache.spark.api.java.function.MapFunction; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Encoders; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; // $example off:spark_hive$ public class JavaSparkHiveExample { public static void main(String[] args) { // $example on:spark_hive$ SparkSession spark = SparkSession .builder() .appName("Java Spark Hive Example") .master("local[*]") .config("hive.metastore.uris", "thrift://localhost:9083") .enableHiveSupport() .getOrCreate(); Dataset<Row> jsonTest = spark.read().json("/tmp/testJSON.json"); jsonTest.createOrReplaceTempView("jsonTest"); Dataset<Row> showAll = spark.sql("SELECT * FROM jsonTest"); showAll.show(); spark.stop(); } }I am having a hard tome to remake the view to a INSERT statement instead in order to INSERT the JSON into table test1 (for example).
Help is very appreciated!
最满意答案
如果你想保存到一个Hive表,你可以这样做
showAll.write().saveAsTable("tableName")如果您想要将本地文件读取到执行程序,请使用file://作为文件路径的前缀。 请注意,Spark是一个分布式进程,所以所有执行者都需要能够在同一位置读取文件,这就是为什么HDFS通常与Spark一起使用
If you want to save to a Hive table, you can do
showAll.write().saveAsTable("tableName")If you want to read a local file to the executor, use file:// to prefix the file path. Note, Spark is a distributed process, so all executors need to be able to read the file at the same location, which is why HDFS is commonly used with Spark
更多推荐
发布评论