气流触发

编程入门 行业动态 更新时间:2024-10-28 04:28:59
本文介绍了气流触发_执行日期是第二天,为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

最近我测试了气流,以至于在运行 airflow trigger_dag< my-dag>

Recently I have tested airflow so much that have one problem with execution_date when running airflow trigger_dag <my-dag>.

我从执行日期 cwiki.apache/confluence/display/AIRFLOW/Common+Pitfalls rel = noreferrer>此处:

I have learned that execution_date is not what we think at first time from here:

气流被开发为满足ETL需求的解决方案。在ETL世界中,通常汇总数据。因此,如果我想汇总2016年2月19日的数据,我会在格林尼治标准时间2016-02-20午夜进行,在2016-02-19年的所有数据变为

Airflow was developed as a solution for ETL needs. In the ETL world, you typically summarize data. So, if I want to summarize data for 2016-02-19, I would do it at 2016-02-20 midnight GMT, which would be right after all data for 2016-02-19 becomes available.

start_date = datetimebine(datetime.today(), datetime.min.time()) args = { "owner": "xigua", "start_date": start_date } dag = DAG(dag_id="hadoopprojects", default_args=args, schedule_interval=timedelta(days=1)) wait_5m = ops.TimeDeltaSensor(task_id="wait_5m", dag=dag, delta=timedelta(minutes=5))

上面的代码是我日常工作流程的开始部分,第一个任务是一个TimeDeltaSensor,它将在实际工作之前再等待5分钟,因此这意味着我的dag将在 2016-09-09T00:05:00触发, 2016-09-10T00:05:00 ...等

Above codes is the start part of my daily workflow, the first task is a TimeDeltaSensor that waits another 5 minutes before actual work, so this means my dag will be triggered at 2016-09-09T00:05:00, 2016-09-10T00:05:00... etc.

在Web UI中,我可以看到类似 scheduled__2016-09-20T00:00:00 的内容,并且任务在 2016-09-21T00运行:00:00 ,根据 ETL 模型,这似乎是合理的。

In Web UI, I can see something like scheduled__2016-09-20T00:00:00, and task is run at 2016-09-21T00:00:00, which seems reasonable according to ETL model.

但是总有一天我的dag未被未知原因触发,因此我手动触发它,如果我在 2016-09-20T00:10:00 触发它,则TimeDeltaSensor将等到 2016-09-21T00:15:00 即可运行。

However someday my dag is not triggered for unknown reason, so I trigger it manually, if I trigger it at 2016-09-20T00:10:00, then the TimeDeltaSensor will wait until 2016-09-21T00:15:00 before run.

这不是我想要的,我希望它运行在 2016-09-20T00:15:00 不是第二天,我尝试通过-conf'{ execution_date: 2016-09-20}',但不起作用。

This is not what I want, I want it to run at 2016-09-20T00:15:00 not the next day, I have tried passing execution_date through --conf '{"execution_date": "2016-09-20"}', but it doesn't work.

我应如何处理此问题?

$ airflow version [2016-09-21 17:26:33,654] {__init__.py:36} INFO - Using executor LocalExecutor ____________ _____________ ____ |__( )_________ __/__ /________ __ ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / / ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ / _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/ v1.7.1.3

推荐答案

首先,我建议您在 start_date 中使用常量,因为动态变量会根据您的气流管道由调度程序评估而无法正常运行。

First, I recommend you use constants for start_date, because dynamic ones would act unpredictably based on with your airflow pipeline is evaluated by the scheduler.

有关开始日期的更多信息,请参见我写的FAQ条目,并进行以下整理: airflow.apache/faq.html#what-s-the-deal- with-start-date

More information about start_date here in an FAQ entry that I wrote and sort all this out: airflow.apache/faq.html#what-s-the-deal-with-start-date

现在,大约 execution_date ,当它被触发时,这是一个Airflow入门人员的常见陷阱。气流根据其覆盖的计划时间段的左边界(而不是触发时间)(该时间段的右边界)设置 execution_date 。例如,当运行 schedule ='@ hourly'任务时,该任务将每小时触发一次。在下午2点触发的任务的 execution_date 为下午1点,因为它假设您正在处理下午2点到下午1点到下午2点的时间窗口。同样,如果您运行日常工作,则运行 execution_date 为 2016-01-01 的运行 2016-01-02 的午夜。

Now, about execution_date and when it is triggered, this is a common gotcha for people onboarding on Airflow. Airflow sets execution_date based on the left bound of the schedule period it is covering, not based on when it fires (which would be the right bound of the period). When running an schedule='@hourly' task for instance, a task will fire every hour. The task that fires at 2pm will have an execution_date of 1pm because it assumes that you are processing the 1pm to 2pm time window at 2pm. Similarly, if you run a daily job, the run an with execution_date of 2016-01-01 would trigger soon after midnight on 2016-01-02.

在用术语思考时,此左侧的标签很有意义ETL和差分负载,但在考虑类似cron的简单调度程序时会引起混淆。

This left-bound labelling makes a lot of sense when thinking in terms of ETL and differential loads, but gets confusing when thinking in terms of a simple, cron-like scheduler.

更多推荐

气流触发

本文发布于:2023-11-24 03:09:30,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1623755.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:气流

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!