我的 Cloud Composer 管理的 Airflow 卡住了几个小时,因为我取消了一个耗时太长的任务实例(我们称之为任务 A)
My Cloud Composer managed Airflow got stuck for hours since I've canceled a Task Instance that was taking too long (Let's call it Task A)
我已经清除了所有的 DAG 运行和任务实例,但是有几个作业正在运行,一个作业处于关闭状态(我想是任务 A 的作业)(我的工作快照).
I've cleared all the DAG Runs and task instances, but there are a few jobs running and one job with Shutdown state (I suppose the job of Task A) (snapshot of my Jobs).
此外,调度程序似乎没有运行,因为最近删除的 DAG 不断出现在仪表板中
Besides, it seems that the scheduler is not running since recently deleted DAGs keep appearing in the dashboard
有没有办法终止作业或重置调度程序?欢迎任何解除作曲家卡住的想法.
Is there a way to kill the jobs or reset the scheduler? Any idea to un-stuck the composer will be welcomed.
推荐答案您可以按如下方式重新启动调度程序:
You can restart the scheduler as follows:
来自您的云外壳:
1.确定您环境的 Kubernetes 集群:
1.Determine your environment’s Kubernetes cluster:
gcloud composer environments describe ENVIRONMENT_NAME --location LOCATION2.获取凭据并连接到 Kubernetes 集群:
2.Get credentials and connect to the Kubernetes cluster:
gcloud container clusters get-credentials ${GKE_CLUSTER} --zone ${GKE_LOCATION}3.运行以下命令重新启动调度程序:
3.Run the following command to restart the scheduler:
kubectl get deployment airflow-scheduler -o yaml | kubectl replace --force -f -第 1 步和第 2 步的详细信息此处.Step 3 基本上将airflow-scheduler"部署替换为自身,从而重启服务.
Steps 1 and 2 are detailed here. Step 3 basically replaces the "airflow-scheduler" deployment with itself, thus restarting the service.
如果重新启动调度程序没有帮助,如果每次都发生这种情况,您可能还需要重新创建您的 Composer 环境并排查 DAG 的故障.
If restarting the scheduler doesn’t help you may as well need to recreate your Composer Environment and Troubleshoot your DAGs if this happens every time.
更多推荐
Cloud Composer (Airflow) 作业卡住
发布评论