Airflow和Spark / Hadoop

编程入门 行业动态 更新时间:2024-10-13 18:20:34
本文介绍了Airflow和Spark / Hadoop-唯一集群或一个用于Airflow,另一个集群用于Spark / Hadoop的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在尝试找出哪种方法可以与Airflow和Spark / Hadoop一起使用。 我已经有一个Spark / Hadoop集群,正在考虑为Airflow创建另一个集群,该集群将作业远程提交给Spark / Hadoop集群。

I'm trying to figure out which is the best way to work with Airflow and Spark/Hadoop. I already have a Spark/Hadoop cluster and I'm thinking about creating another cluster for Airflow that will submit jobs remotely to Spark/Hadoop cluster.

任何关于它的建议?从另一个集群远程部署spark看起来有点复杂,并且会创建一些文件配置重复。

Any advice about it? Looks like it's a little complicated to deploy spark remotely from another cluster and that will create some file configuration duplication.

推荐答案

我相信,需要配置 yarn-site.xml 文件,以便为 spark-submit --master yarn --deploy-mode client 开始工作。 (您可以尝试集群部署模式,但是我认为由Airflow管理驱动程序不是一个坏主意)

You really only need to configure a yarn-site.xml file, I believe, in order for spark-submit --master yarn --deploy-mode client to work. (You could try cluster deploy mode, but I think having the driver being managed by Airflow isn't a bad idea)

一旦在YARN中部署了Application Master,则Spark在Hadoop集群本地运行。

Once an Application Master is deployed within YARN, then Spark is running locally to the Hadoop cluster.

如果确实需要,可以添加 hdfs-site.xml 和 hive -site.xml 也要从Airflow提交(如果可能的话),否则至少应从YARN容器类路径中拾取hdfs-site.xml文件(并非所有NodeManager都可以具有在其上安装的Hive客户端)

If you really want, you could add a hdfs-site.xml and hive-site.xml to be submitted as well from Airflow (if that's possible), but otherwise at least hdfs-site.xml files should be picked up from the YARN container classpath (not all NodeManagers could have a Hive client installed on them)

更多推荐

Airflow和Spark / Hadoop

本文发布于:2023-11-23 16:35:07,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1622146.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:Airflow   Spark   Hadoop

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!