问题描述
限时送ChatGPT账号..带 Hive 的 Spark 2.0
Spark 2.0 with Hive
假设我正在尝试编写一个 spark 数据帧,irisDf
到 orc 并将其保存到 hive Metastore
Let's say I am trying to write a spark dataframe, irisDf
to orc and save it to the hive metastore
在 Spark 中,我会这样做,
In Spark I would do that like this,
irisDf.write.format("orc")
.mode("overwrite")
.option("path", "s3://my_bucket/iris/")
.saveAsTable("my_database.iris")
在sparklyr
中,我可以使用spark_write_table
函数,
In sparklyr
I can use the spark_write_table
function,
data("iris")
iris_spark <- copy_to(sc, iris, name = "iris")
output <- spark_write_table(
iris
,name = 'my_database.iris'
,mode = 'overwrite'
)
但这不允许我设置 path
或 format
But this doesn't allow me to set path
or format
我也可以使用 spark_write_orc
spark_write_orc(
iris
, path = "s3://my_bucket/iris/"
, mode = "overwrite"
)
但它没有 saveAsTable
选项
现在,我可以使用 invoke
语句来复制 Spark 代码,
Now, I CAN use invoke
statements to replicate the Spark code,
sdf <- spark_dataframe(iris_spark)
writer <- invoke(sdf, "write")
writer %>%
invoke('format', 'orc') %>%
invoke('mode', 'overwrite') %>%
invoke('option','path', "s3://my_bucket/iris/") %>%
invoke('saveAsTable',"my_database.iris")
但我想知道是否有办法将 format
和 path
选项传递到 spark_write_table
或 saveAsTable
选项到 spark_write_orc
?
But I am wondering if there is anyway to instead pass the format
and path
options into spark_write_table
or the saveAsTable
option into spark_write_orc
?
推荐答案
path
可以使用 options
参数设置,相当于 options
> 调用本机 DataFrameWriter
:
path
can be set using options
argument, which is equivalent to options
call in the native DataFrameWriter
:
spark_write_table(
iris_spark, name = 'my_database.iris', mode = 'overwrite',
options = list(path = "s3a://my_bucket/iris/")
)
默认情况下,在 Spark 中,这将创建一个存储为 Parquet 的表,位于 路径
(可以使用partition_by
参数指定分区子目录).
By default in Spark, this will create a table stored as Parquet at path
(partition subdirectories can be specified with the partition_by
argument).
截至今天,没有这样的格式选项,但一个简单的解决方法是在运行时设置 spark.sessionState.conf.defaultDataSourceName
属性
As of today there is no such option for format, but an easy workaround is to set spark.sessionState.conf.defaultDataSourceName
property, either on runtime
spark_session_config(
sc, "spark.sessionState.conf.defaultDataSourceName", "orc"
)
或者在您创建会话时.
这篇关于sparklyr 我可以将格式和路径选项传递给 spark_write_table 吗?或将 saveAsTable 与 spark_write_orc 一起使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论