我是Airflow的新手。
I'm new to Airflow.
我的目标是每天从现在开始1个小时运行一次dag。
My goal is to run a dag, on a daily basis, starting 1 hour from now.
我确实误会了气流时间表的间隔结束调用规则。
I'm truly misunderstanding the airflow schedule "end-of-interval invoke" rules.
来自文档[(Airflow Docs) ] [1]
From the docs [(Airflow Docs)][1]
请注意,如果您以一天的schedule_interval运行DAG,则标记为2016-01-01的运行将为在2016-01-01T23:59之后立即触发。换句话说,该作业实例在其涵盖的期限结束后即开始。
Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.
我按如下所示设置了schedule_interval:
I set schedule_interval as followed:
schedule_interval = 00 15 * * *
和开始日期如下所示: start_date = datetime(year = 2019,month = 8,day = 7)
and start_date as followed: start_date=datetime(year=2019, month=8, day=7)
我的假设是,如果现在现在是世界标准时间(UTC)时间14:00:00 PM,今天的日期是2019年7月8日,那么我的dag将完全在一小时内执行。 但是,我的工作根本没有开始。
My assumption was, that if now it's 14:00:00 PM (UTC time) and the date today is 07-08-2019, then my dag will be executed exactly in one hour. However, my dag is not starting at all.
推荐答案所以整个页面都在谈论气流工作,而不是预定的。 airflow.apache/faq.html
So there is a whole page talking about airflow job not been scheduled. airflow.apache/faq.html
这里要注意的关键是:
在Start_date + $之后,Airflow调度程序立即触发任务b $ b scheduler_interval已通过。
The Airflow scheduler triggers the task soon after the start_date + scheduler_interval is passed.
据我所知,您想触发任务 start_date = datetime( year = 2019,month = 8,day = 7),每天15:00 UTC 。 schedule_interval = 00 15 * * * 表示您将每天在世界标准时间15:00运行任务。根据文档显示,调度程序会在开始日期+ scheduler_interval之后触发您的任务,因此气流不会触发它直到第二天(code)八月8th 2019 15:00:00 UTC 。或者,您可以将日期更改为第六天。通过ETL方式可能更容易理解:您只能在数据经过给定时间后再对其进行处理。因此, 2019年8月7日15:00:00 UTC 是您的起点,您需要等到 2019年8月8日15:00:00 UTC 以在给定时间内运行任务。
To my understanding, you want to trigger a task start_date=datetime(year=2019, month=8, day=7) at 15:00 UTC daily. schedule_interval="00 15 * * *" means you would run the task every day at 15:00 UTC. According to the docs, The scheduler triggers your task after start_date + scheduler_interval, so airflow won't trigger it until the next day which is August 8th 2019 15:00:00 UTC. Or you can change the day to 6th. It might be easier to understand this from ETL way: you can only process the data for a given period after it has passed. So August 7th 2019 15:00:00 UTC is your start point, you need to wait until August 8th 2019 15:00:00 UTC to run the task within that given period.
此外,请注意气流具有execution_data和start_date,您可以找到更多的此处
Also, note airflow has execution_data and start_date, you can find more here
更多推荐
气流调度器误解
发布评论