是否将最佳实践部署到气流中来?
Are there any best practices that are followed for deploying new dags to airflow?
我在google论坛上看到了几条评论,指出这些dags已保存在GIT存储库中,并且定期将其同步到气流集群中的本地位置.关于这种方法,我有几个问题
I saw a couple of comments on the google forum stating that the dags are saved inside a GIT repository and the same is synced periodically to the local location in the airflow cluster. Regarding this approach, I had a couple of questions
我们非常感谢您的帮助.如果您需要更多详细信息,请告诉我?
Any help here is highly appreciated. Let me know in case you need any further details?
推荐答案这是我们为团队管理的方式.
Here is how we manage it for our team.
首先,在命名约定方面,我们的每个 DAG文件名都与DAG本身(包括DAG版本)的内容中的 DAG ID 相匹配.这很有用,因为最终它是您在Airflow UI中看到的DAG ID,因此您将确切知道每个DAG后面使用了哪个文件.
First in terms of naming convention, each of our DAG file name matches the DAG Id from the content of the DAG itself (including the DAG version). This is useful because ultimately it's the DAG Id that you see in the Airflow UI so you will know exactly which file has been used behind each DAG.
像这样的DAG的示例:
Example for a DAG like this:
from airflow import DAG from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2017,12,05,23,59), 'email': ['me@mail'], 'email_on_failure': True } dag = DAG( 'my_nice_dag-v1.0.9', #update version whenever you change something default_args=default_args, schedule_interval="0,15,30,45 * * * *", dagrun_timeout=timedelta(hours=24), max_active_runs=1) [...]DAG文件的名称为: my_nice_dag-v1.0.9.py
- 我们所有的DAG文件都存储在Git存储库中(除其他外)
- 每次在master分支中完成合并请求时,Continuous Integration管道都会开始新的构建并将DAG文件打包为zip(我们使用Atlassian Bamboo,但还有其他解决方案,例如Jenkins,Circle CI,Travis ...)
- 在Bamboo中,我们配置了一个部署脚本(外壳),该脚本解压缩程序包并将DAG文件放置在Airflow服务器上的/dags 文件夹中.
- 我们通常将DAG部署在DEV中进行测试,然后部署到UAT,最后部署到PROD.借助上面提到的shell脚本,只需在Bamboo UI中单击一个按钮即可完成部署.
好处
更多推荐
在气流上部署dag文件的有效方法
发布评论