Apache Airflow计划程序不会在计划时间触发DAG

编程入门 行业动态 更新时间:2024-10-07 03:20:50
本文介绍了Apache Airflow计划程序不会在计划时间触发DAG的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

当我安排DAG在每天的特定时间运行时,DAG的执行根本不会发生。 但是,当我重新启动Airflow Web服务器和调度程序时,DAG在该特定日期的预定时间执行一次,并且从第二天起不执行。 我正在使用Airflow版本v1.7.1.3和python 2.7.6。 DAG代码如下:

When I schedule DAGs to run at a specific time everyday, the DAG execution does not take place at all. However, when I restart Airflow webserver and scheduler, the DAGs execute once on the scheduled time for that particular day and do not execute from the next day onwards. I am using Airflow version v1.7.1.3 with python 2.7.6. Here goes the DAG code:

from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta import time n=time.strftime("%Y,%m,%d") v=datetime.strptime(n,"%Y,%m,%d") default_args = { 'owner': 'airflow', 'depends_on_past': True, 'start_date': v, 'email': ['airflow@airflow'], 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=10), } dag = DAG('dag_user_answer_attempts', default_args=default_args, schedule_interval='03 02 * * *') # t1, t2 and t3 are examples of tasks created by instantiating operators t1 = BashOperator( task_id='user_answer_attempts', bash_command='python /home/ubuntu/bigcrons/appengine-flask-skeleton-master/useranswerattemptsgen.py', dag=dag)

我做错什么了吗?

推荐答案

您的问题是开始日期设置为当前时间。气流在间隔的结束而不是开始的时间内运行作业。这意味着您的工作将在第一个间隔之后进行。

Your issue is the start_date being set for the current time. Airflow runs jobs at the end of an interval, not the beginning. This means that the first run of your job is going to be after the first interval.

示例:

您创建了一个dag,并在午夜将其放到Airflow中。今天(20XX-01-01 00:00:00)也是开始日期,但是它是硬编码的(开始日期:datetime(20XX,1,1) )。计划时间间隔是每天的,就像您的时间间隔一样( 3 2 * * * )。

You make a dag and put it live in Airflow at midnight. Today (20XX-01-01 00:00:00) is also the start_date, but it is hard-coded ("start_date":datetime(20XX,1,1)). The schedule interval is daily, like yours (3 2 * * *).

第一次dag将被排队执行的时间是20XX-01-02 02:03:00,因为那是间隔时间结束。如果您查看当时正在运行的dag,它的开始日期时间应该是schedule_date之后的大约一天。

The first time this dag will be queued for execution is 20XX-01-02 02:03:00, because that is when the interval period ends. If you look at your dag being run at that time, it should have a started datetime of roughly one day after the schedule_date.

您可以通过将开始日期硬编码为日期或会足够)。 Airflow建议您在需要重新运行作业或回填(或结束dag)的情况下使用静态开始日期。

You can solve this by having your start_date hard-coded to a date or by making sure that the dynamic date is further in the past than the interval between executions (In your case, 2 days would be plenty). Airflow recommends you use static start_dates in case you need to re-run jobs or backfill (or end a dag).

有关回填的更多信息(本信息的反面)常见的stackoverflow问题),请检查文档或以下问题: Airflow无法正确调度Python

For more information on backfilling (the opposite side of this common stackoverflow question), check the docs or this question: Airflow not scheduling Correctly Python

更多推荐

Apache Airflow计划程序不会在计划时间触发DAG

本文发布于:2023-11-23 21:47:20,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1622942.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:计划   会在   时间   程序   Apache

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!