当我安排DAG在每天的特定时间运行时,DAG的执行根本不会发生。 但是,当我重新启动Airflow Web服务器和调度程序时,DAG在该特定日期的预定时间执行一次,并且从第二天起不执行。 我正在使用Airflow版本v1.7.1.3和python 2.7.6。 DAG代码如下:
When I schedule DAGs to run at a specific time everyday, the DAG execution does not take place at all. However, when I restart Airflow webserver and scheduler, the DAGs execute once on the scheduled time for that particular day and do not execute from the next day onwards. I am using Airflow version v1.7.1.3 with python 2.7.6. Here goes the DAG code:
from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta import time n=time.strftime("%Y,%m,%d") v=datetime.strptime(n,"%Y,%m,%d") default_args = { 'owner': 'airflow', 'depends_on_past': True, 'start_date': v, 'email': ['airflow@airflow'], 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=10), } dag = DAG('dag_user_answer_attempts', default_args=default_args, schedule_interval='03 02 * * *') # t1, t2 and t3 are examples of tasks created by instantiating operators t1 = BashOperator( task_id='user_answer_attempts', bash_command='python /home/ubuntu/bigcrons/appengine-flask-skeleton-master/useranswerattemptsgen.py', dag=dag)我做错什么了吗?
推荐答案您的问题是开始日期设置为当前时间。气流在间隔的结束而不是开始的时间内运行作业。这意味着您的工作将在第一个间隔之后进行。
Your issue is the start_date being set for the current time. Airflow runs jobs at the end of an interval, not the beginning. This means that the first run of your job is going to be after the first interval.
示例:
您创建了一个dag,并在午夜将其放到Airflow中。今天(20XX-01-01 00:00:00)也是开始日期,但是它是硬编码的(开始日期:datetime(20XX,1,1) )。计划时间间隔是每天的,就像您的时间间隔一样( 3 2 * * * )。
You make a dag and put it live in Airflow at midnight. Today (20XX-01-01 00:00:00) is also the start_date, but it is hard-coded ("start_date":datetime(20XX,1,1)). The schedule interval is daily, like yours (3 2 * * *).
第一次dag将被排队执行的时间是20XX-01-02 02:03:00,因为那是间隔时间结束。如果您查看当时正在运行的dag,它的开始日期时间应该是schedule_date之后的大约一天。
The first time this dag will be queued for execution is 20XX-01-02 02:03:00, because that is when the interval period ends. If you look at your dag being run at that time, it should have a started datetime of roughly one day after the schedule_date.
您可以通过将开始日期硬编码为日期或会足够)。 Airflow建议您在需要重新运行作业或回填(或结束dag)的情况下使用静态开始日期。
You can solve this by having your start_date hard-coded to a date or by making sure that the dynamic date is further in the past than the interval between executions (In your case, 2 days would be plenty). Airflow recommends you use static start_dates in case you need to re-run jobs or backfill (or end a dag).
有关回填的更多信息(本信息的反面)常见的stackoverflow问题),请检查文档或以下问题: Airflow无法正确调度Python
For more information on backfilling (the opposite side of this common stackoverflow question), check the docs or this question: Airflow not scheduling Correctly Python
更多推荐
Apache Airflow计划程序不会在计划时间触发DAG
发布评论