AirflowException:Celery命令失败

编程入门 行业动态 更新时间:2024-10-28 20:27:59
本文介绍了AirflowException:Celery命令失败-记录的主机名与此实例的主机名不匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我在两个AWS EC2实例上运行的集群环境中运行Airflow。一个给主人,一个给工人。工作节点虽然在运行 $ airflow worker时会定期抛出此错误:

I'm running Airflow on a clustered environment running on two AWS EC2-Instances. One for master and one for the worker. The worker node though periodically throws this error when running "$airflow worker":

[2018-08-09 16:15:43,553] {jobs.py:2574} WARNING - The recorded hostname ip-1.2.3.4 does not match this instance's hostname ip-1.2.3.4.eco.tanonprodanyname.io Traceback (most recent call last): File "/usr/bin/airflow", line 27, in <module> args.func(args) File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 387, in run run_job.run() File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 198, in run self._execute() File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 2527, in _execute self.heartbeat() File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 182, in heartbeat self.heartbeat_callback(session=session) File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 50, in wrapper result = func(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 2575, in heartbeat_callback raise AirflowException("Hostname of job runner does not match") airflow.exceptions.AirflowException: Hostname of job runner does not match [2018-08-09 16:15:43,671] {celery_executor.py:54} ERROR - Command 'airflow run arl_source_emr_test_dag runEmrStep2WaiterTask 2018-08-07T00:00:00 --local -sd /var/lib/airflow/dags/arl_source_emr_test_dag.py' returned non-zero exit status 1. [2018-08-09 16:15:43,681: ERROR/ForkPoolWorker-30] Task airflow.executors.celery_executor.execute_command[875a4da9-582e-4c10-92aa-5407f3b46d5f] raised unexpected: AirflowException('Celery command failed',) Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 52, in execute_command subprocess.check_call(command, shell=True) File "/usr/lib64/python3.6/subprocess.py", line 291, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'airflow run arl_source_emr_test_dag runEmrStep2WaiterTask 2018-08-07T00:00:00 --local -sd /var/lib/airflow/dags/arl_source_emr_test_dag.py' returned non-zero exit status 1. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/dist-packages/celery/app/trace.py", line 382, in trace_task R = retval = fun(*args, **kwargs) File "/usr/lib/python3.6/dist-packages/celery/app/trace.py", line 641, in __protected_call__ return self.run(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 55, in execute_command raise AirflowException('Celery command failed') airflow.exceptions.AirflowException: Celery command failed

此错误发生该任务时,该任务在Airflow上被标记为失败,因此当该任务中实际上没有出现任何错误时,我的DAG也会失败。

When this error occurs the task is marked as failed on Airflow and thus fails my DAG when nothing actually went wrong in the task.

我将Redis用作队列,将postgreSQL用作我的队列元数据库。两者都是外部的AWS服务。我在公司环境中运行所有这些程序,这就是为什么服务器全名是 ip-1.2.3.4.eco.tanonprodanyname.io 的原因。似乎它想要在某个地方使用此全名,但我不知道该在哪里修复此值,以便它获得 ip-1.2.3.4.eco.tanonprodanyname.io 而不只是 ip-1.2.3.4 。

I'm using Redis as my queue and postgreSQL as my meta-database. Both are external as AWS services. I'm running all of this on my company environment which is why the full name of the server is ip-1.2.3.4.eco.tanonprodanyname.io. It looks like it wants this full name somewhere but I have no idea where I need to fix this value so that it's getting ip-1.2.3.4.eco.tanonprodanyname.io instead of just ip-1.2.3.4.

此问题的真正怪异之处在于它并非总是会发生。当我运行DAG时,它似乎偶尔会偶尔发生一次。它也偶尔出现在我所有的DAG上,因此不只是一个DAG。我觉得很奇怪,尽管它是零星的,因为这意味着其他任务运行正在处理IP地址,而这一切都很好。

The really weird thing about this issue is that it doesn't always happen. It seems to just randomly happen every once in a while when I run the DAG. It's also occurring on all of my DAGs sporadically so it's not just one DAG. I find it strange though how it's sporadic because that means other task runs are handling the IP address for whatever this is just fine.

注意:出于隐私原因,我已将真实IP地址更改为1.2.3.4。

Note: I've changed the real IP address to 1.2.3.4 for privacy reasons.

答案:

github/apache/incubator-airflow/pull / 2484

这正是我遇到的问题,AWS EC2实例上的其他Airflow用户也遇到了此问题。

This is exactly the problem I am having and other Airflow users on AWS EC2-Instances are experiencing it as well.

推荐答案

在任务实例运行时设置主机名,并将其设置为 self.hostname = socket.getfqdn(),其中socket是python包 import socket 。

The hostname is set when the task instance runs, and is set to self.hostname = socket.getfqdn(), where socket is the python package import socket.

触发此错误的比较是:

fqdn = socket.getfqdn() if fqdn != ti.hostname: logging.warning("The recorded hostname {ti.hostname} " "does not match this instance's hostname " "{fqdn}".format(**locals())) raise AirflowException("Hostname of job runner does not match")

似乎在工作程序运行时ec2实例上的主机名正在更改。也许尝试按此处所述手动设置主机名 forums.aws.amazon / thread.jspa?threadID = 246906 ,看看是否还可以。

It seems like the hostname on the ec2 instance is changing on you while the worker is running. Perhaps try manually setting the hostname as described here forums.aws.amazon/thread.jspa?threadID=246906 and see if that sticks.

更多推荐

AirflowException:Celery命令失败

本文发布于:2023-10-17 03:07:48,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1499600.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:命令   AirflowException   Celery

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!