如何在内部提供Hive

如何在内部提供Hive-site.xml的值而不是引用Spark Jar执行的路径(How to provide the values of Hive-site.xml Internally instead of refering to the path on execution of Spark Jar)

我正在使用一段代码从Hive中获取表格以激活它并且工作正常，因为我将Hive-site.xml文件放在eclipse的资源文件夹中。

在线下我将代码转换为jar文件并引用Hive-site.xml文件的路径来执行程序。

是否有任何原因我可以在内部（在程序本身中）使用Hive-site.xml的值来覆盖该文件引用部分？

代码如下：

val appConf = ConfigFactory.load() val conf = new SparkConf(). setAppName("hivedb").setMaster(appConf.getConfig(args(0)).getString("deploymentMaster")) val sc = new SparkContext(conf) val hc = new HiveContext(sc) val source = hc.sql("SELECT * from sample.source").rdd.map(_.mkString(",")) val destination = hc.sql("select * from sample.destination").rdd.map(_.mkString(","))

Hive-site.xml文件值：

<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveroot</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hivepassword</value> </property> <property> <name>hive.exec.scratchdir</name> <value>/tmp/hive/${user.name}</value> </property> </configuration>

我正在寻找类似下面的东西：

val url = "jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true" val user = "hiveroot" val password = "hivepassword" val src ="/tmp/hive/${user.name}" val a = hc.read.format("jdbc").options(Map("javax.jdo.option.ConnectionURL" -> url, "user" -> user, "password" -> password, "sql" -> "sample.source", "hive.exec.scratchdir"->src)).load().collect().foreach(println)

只使用程序内部所需的Hive-site.xml文件值而无需引用该文件。

正如Raktotpal Bordoloi所建议的那样

val warehouseLocation = "/usr/hive/warehouse" val spark = SparkSession.builder().master("local") .appName("spark session example") .config("javax.jdo.option.ConnectionURL","jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true") .config("javax.jdo.option.ConnectionUserName","hiveroot") .config("javax.jdo.option.ConnectionPassword","hivepassword") .config("hive.exec.scratchdir","/tmp/hive/${user.name}") .config("spark.sql.warehouse.dir", warehouseLocation) // .config("hive.metastore.uris", "thrift://localhost:9083") .enableHiveSupport() .getOrCreate()

import spark.implicits._ import spark.sql

sql（“select * from sample.source”）。collect.foreach（println）

谢谢！

I'm having a piece of code to fetch Tables from Hive to spark and it works fine, for that I'm placing Hive-site.xml file in resource folder of eclipse.

Down the line I convert the code to jar file and refer the path of the Hive-site.xml file to execute the program.

Is there is any why I can use the values of Hive-site.xml internally(in the program itself) to override that file referencing part?

Code below:

val appConf = ConfigFactory.load() val conf = new SparkConf(). setAppName("hivedb").setMaster(appConf.getConfig(args(0)).getString("deploymentMaster")) val sc = new SparkContext(conf) val hc = new HiveContext(sc) val source = hc.sql("SELECT * from sample.source").rdd.map(_.mkString(",")) val destination = hc.sql("select * from sample.destination").rdd.map(_.mkString(","))

Hive-site.xml file values:

<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveroot</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hivepassword</value> </property> <property> <name>hive.exec.scratchdir</name> <value>/tmp/hive/${user.name}</value> </property> </configuration>

I'm looking for something like below:

val url = "jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true" val user = "hiveroot" val password = "hivepassword" val src ="/tmp/hive/${user.name}" val a = hc.read.format("jdbc").options(Map("javax.jdo.option.ConnectionURL" -> url, "user" -> user, "password" -> password, "sql" -> "sample.source", "hive.exec.scratchdir"->src)).load().collect().foreach(println)

using only the required values of Hive-site.xml file inside the program without the need of referring to the file.

as suggested by Raktotpal Bordoloi

val warehouseLocation = "/usr/hive/warehouse" val spark = SparkSession.builder().master("local") .appName("spark session example") .config("javax.jdo.option.ConnectionURL","jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true") .config("javax.jdo.option.ConnectionUserName","hiveroot") .config("javax.jdo.option.ConnectionPassword","hivepassword") .config("hive.exec.scratchdir","/tmp/hive/${user.name}") .config("spark.sql.warehouse.dir", warehouseLocation) // .config("hive.metastore.uris", "thrift://localhost:9083") .enableHiveSupport() .getOrCreate()

import spark.implicits._ import spark.sql

sql("select * from sample.source").collect.foreach(println)

Thank you!

最满意答案

在Spark 2.0中，您可以在创建SparkSession之前在SparkSession的构建器上设置“spark.sql.warehouse.dir”。它应该在创建Hive-Context时正确传播。

val spark = SparkSession.builder() .config("spark.sql.warehouse.dir", "...") .config("hive.metastore.uris", "thrift://localhost:9083")

当模式是远程的时（在上面的例子中），将不使用诸如“javax.jdo.option.ConnectionURL”之类的配置（因为它们由与数据库通信的远程Metastore服务器使用）。

对于Spark 1.6，您需要将hive-site.xml放在classpath中。

In Spark 2.0 you can set "spark.sql.warehouse.dir" on the SparkSession's builder, before creating a SparkSession. It should propagate correctly while creating Hive-Context.

val spark = SparkSession.builder() .config("spark.sql.warehouse.dir", "...") .config("hive.metastore.uris", "thrift://localhost:9083")

When the mode is remote (in the above case), configurations like "javax.jdo.option.ConnectionURL" will not be used (because they are used by remote metastore server that talks to the database).

For Spark 1.6, you need to place the hive-site.xml in classpath.

更多推荐

如何在内部提供Hive

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表