气流不会回填最新运行

编程入门 行业动态 更新时间:2024-10-27 08:33:34
本文介绍了气流不会回填最新运行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

由于某种原因,Airflow似乎不会触发具有每周计划间隔的dag的最新运行。

For some reason, Airflow doesn't seem to trigger the latest run for a dag with a weekly schedule interval.

当前日期:

$ date $ Tue Aug 9 17:09:55 UTC 2016

DAG:

from datetime import datetime from datetime import timedelta from airflow import DAG from airflow.operators.bash_operator import BashOperator dag = DAG( dag_id='superdag', start_date=datetime(2016, 7, 18), schedule_interval=timedelta(days=7), default_args={ 'owner': 'Jon Doe', 'depends_on_past': False } ) BashOperator( task_id='print_date', bash_command='date', dag=dag )

运行调度程序

$ airflow scheduler -d superdag

您需要预计总共有四个DAG运行作为计划er应该回填7 / 18、7 / 25、8 / 1和8/8。 但是,上次运行未安排。

You'd expect a total of four DAG Runs as the scheduler should backfill for 7/18, 7/25, 8/1, and 8/8. However, the last run is not scheduled.

编辑1:

I理解Vineet,尽管这似乎并不能解释我的问题。

I understand that Vineet although that doesn’t seem to explain my issue.

在上面的示例中,DAG的开始日期是7月18日。

In my example above, the DAG’s start date is July 18.

  • 第一次DAG运行:7月18日
  • 第二次DAG运行:7月25日
  • 第三次DAG运行:8月1日
  • 第四次DAG运行:8月8日(未运行)
  • First DAG Run: July 18
  • Second DAG Run: July 25
  • Third DAG Run: Aug 1
  • Fourth DAG Run: Aug 8 (not run)

其中每个DAG运行处理前一周的数据。

Where each DAG Run processes data from the previous week.

今天是8月9日,我希望第四次DAG运行已执行,执行日期为8月8日,该运行日期用于处理最后一周(八月1,直到8月8日)。

Today being Aug 9, I would expect the Fourth DAG Run to have executed with a execution date of Aug 8 which processes data for the last week (Aug 1 until Aug 8) but it doesn’t.

推荐答案

气流始终排在上一个时段。因此,如果您计划在8月9日每天运行一次dag,则它将在8月8日将执行日期定为执行日期。同样,如果计划时间间隔是每周一次,则在8月9日,它将计划在1周后(即8月2日)进行计划,尽管该时间间隔是在8月9日执行的。这只是气流簿记。您可以在气流Wiki( cwiki.apache/汇合/显示/气流/共同+陷阱):

Airflow always schedules for the previous period. So if you have a dag that is scheduled to run daily, on Aug 9th, it will schedule a run with execution_date Aug 8th. Similarly if the schedule interval is weekly, then on Aug 9th, it will schedule for 1 week back i.e. Aug 2nd, though this gets run on Aug 9th itself. This is just airflow bookkeeping. You can find this in the airflow wiki (cwiki.apache/confluence/display/AIRFLOW/Common+Pitfalls):

了解执行日期 ETL需求的解决方案。在ETL世界中,通常会汇总数据。因此,如果我想总结2016-02-19的数据,我会在格林尼治标准时间2016-02-20午夜进行,这将在2016-02-19的所有数据可用之后。 此日期在Jinja和Python可调用上下文中都可用,如此处所述。注意,ds指的是date_string,而不是日期开始,这可能会使某些人感到困惑。

Understanding the execution date Airflow was developed as a solution for ETL needs. In the ETL world, you typically summarize data. So, if I want to summarize data for 2016-02-19, I would do it at 2016-02-20 midnight GMT, which would be right after all data for 2016-02-19 becomes available. This date is available to you in both Jinja and a Python callable's context in many forms as documented here. As a note ds refers to date_string, not date start as may be confusing to some.

更多推荐

气流不会回填最新运行

本文发布于:2023-11-23 17:38:40,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1622321.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:气流   最新

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!