在Airflow中,我希望每天在特定时间在非UTC时区运行作业。我该如何安排时间?
问题在于,一旦触发了夏时制,我的工作就会运行一个小时或太晚。
In Airflow, I'd like a job to run at specific time each day in a non-UTC timezone. How can I go about scheduling this?
The problem is that once daylight savings time is triggered, my job will either be running an hour too soon or an hour too late. In the Airflow docs, it seems like this is a known issue:
In case you set a cron schedule, Airflow assumes you will always want to run at the exact same time. It will then ignore day light savings time. Thus, if you have a schedule that says run at end of interval every day at 08:00 GMT+1 it will always run end of interval 08:00 GMT+1, regardless if day light savings time is in place.
Has anyone else run into this issue? Is there a work around? Surely the best practice cannot be to alter all the scheduled times after Daylight Savings Time occurs?
Thanks.
解决方案Starting with Airflow 1.10, time-zone aware DAGs can be defined using time-zone aware datetime objects to specify start_date. For Airflow to schedule DAG runs always at the same time (regardless of a possible daylight-saving-time switch), use cron expressions to specify schedule_interval. To make Airflow schedule DAG runs with fixed intervals (regardless of a possible daylight-saving-time switch), use datetime.timedelta() to specify schedule_interval.
For example, consider the following code that, first, uses a cron expression to schedule two consecutive DAG runs, and then uses a fixed interval to do the same.
import pendulum from airflow import DAG from datetime import datetime, timedelta START_DATE = datetime( year=2019, month=10, day=25, hour=8, minute=0, tzinfo=pendulum.timezone('Europe/Kiev'), ) def gen_execution_dates(start_date, schedule_interval): dag = DAG( dag_id='id', start_date=start_date, schedule_interval=schedule_interval ) execution_date = dag.start_date for i in range(1, 3): execution_date = dag.following_schedule(execution_date) print( f'[Run {i}: Execution Date for "{schedule_interval}"]:', dag.timezone.convert(execution_date), ) gen_execution_dates(START_DATE, '0 8 * * *') gen_execution_dates(START_DATE, timedelta(days=1))Running the code produces the following output:
[Run 1: Execution Date for "0 8 * * *"]: 2019-10-26 08:00:00+03:00 [Run 2: Execution Date for "0 8 * * *"]: 2019-10-27 08:00:00+02:00 [Run 1: Execution Date for "1 day, 0:00:00"]: 2019-10-26 08:00:00+03:00 [Run 2: Execution Date for "1 day, 0:00:00"]: 2019-10-27 07:00:00+02:00For the zone [Europe/Kiev], the daylight saving time of 2019 ends on 2019-10-27 at 03:00:00+03:00. That is, between Run 1 and Run 2 in our example.
The first two output lines show that for the DAG runs scheduled with a cron expression the first run and second run are both scheduled for 08:00 (although, in different timezones: Eastern European Summer Time (EEST) and Eastern European Time (EET) respectively).
The last two output lines show that for the DAG runs scheduled with a fixed interval the first run is scheduled for 08:00 (EEST), and the second run is scheduled exactly 1 day (24 hours) later, which is at 07:00 (EET) due to the daylight-saving-time switch.
The following figure illustrates the example:
更多推荐
在Airflow中使用Cron时间表时如何考虑夏令时
发布评论