气流调度器误解

编程入门 行业动态 更新时间:2024-10-11 15:19:09
本文介绍了气流调度器误解的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我是Airflow的新手。

I'm new to Airflow.

我的目标是每天从现在开始1个小时运行一次dag。

My goal is to run a dag, on a daily basis, starting 1 hour from now.

我确实误会了气流时间表的间隔结束调用规则。

I'm truly misunderstanding the airflow schedule "end-of-interval invoke" rules.

来自文档[(Airflow Docs) ] [1]

From the docs [(Airflow Docs)][1]

请注意,如果您以一天的schedule_interval运行DAG,则标记为2016-01-01的运行将为在2016-01-01T23:59之后立即触发。换句话说,该作业实例在其涵盖的期限结束后即开始。

Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.

我按如下所示设置了schedule_interval:

I set schedule_interval as followed:

schedule_interval = 00 15 * * *

和开始日期如下所示: start_date = datetime(year = 2019,month = 8,day = 7)

and start_date as followed: start_date=datetime(year=2019, month=8, day=7)

我的假设是,如果现在现在是世界标准时间(UTC)时间14:00:00 PM,今天的日期是2019年7月8日,那么我的dag将完全在一小时内执行。 但是,我的工作根本没有开始。

My assumption was, that if now it's 14:00:00 PM (UTC time) and the date today is 07-08-2019, then my dag will be executed exactly in one hour. However, my dag is not starting at all.

推荐答案

所以整个页面都在谈论气流工作,而不是预定的。 airflow.apache/faq.html

So there is a whole page talking about airflow job not been scheduled. airflow.apache/faq.html

这里要注意的关键是:

在Start_date + $之后,Airflow调度程序立即触发任务b $ b scheduler_interval已通过。

The Airflow scheduler triggers the task soon after the start_date + scheduler_interval is passed.

据我所知,您想触发任务 start_date = datetime( year = 2019,month = 8,day = 7),每天15:00 UTC 。 schedule_interval = 00 15 * * * 表示您将每天在世界标准时间15:00运行任务。根据文档显示,调度程序会在开始日期+ scheduler_interval之后触发您的任务,因此气流不会触发它直到第二天(code)八月8th 2019 15:00:00 UTC 。或者,您可以将日期更改为第六天。通过ETL方式可能更容易理解:您只能在数据经过给定时间后再对其进行处理。因此, 2019年8月7日15:00:00 UTC 是您的起点,您需要等到 2019年8月8日15:00:00 UTC 以在给定时间内运行任务。

To my understanding, you want to trigger a task start_date=datetime(year=2019, month=8, day=7) at 15:00 UTC daily. schedule_interval="00 15 * * *" means you would run the task every day at 15:00 UTC. According to the docs, The scheduler triggers your task after start_date + scheduler_interval, so airflow won't trigger it until the next day which is August 8th 2019 15:00:00 UTC. Or you can change the day to 6th. It might be easier to understand this from ETL way: you can only process the data for a given period after it has passed. So August 7th 2019 15:00:00 UTC is your start point, you need to wait until August 8th 2019 15:00:00 UTC to run the task within that given period.

此外,请注意气流具有execution_data和start_date,您可以找到更多的此处

Also, note airflow has execution_data and start_date, you can find more here

更多推荐

气流调度器误解

本文发布于:2023-11-23 19:19:30,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1622589.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:气流   误解

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!