我正在运行Apache Airflow 1.8.1。我想在我的实例上运行32个以上的并发任务,但是无法使任何配置正常工作。
I'm running Apache Airflow 1.8.1. I would like to run more than 32 concurrent tasks on my instance, but cannot get any of the configurations to work.
我正在使用CeleryExecutor,它是UI显示 parallelism 和 dag_concurrency 的64位,我已经多次重启了Airflow Scheduler,Web服务器和工作程序(我我实际上是在Vagrant机器上进行本地测试,但也已经在EC2实例上进行了测试。)
I am using the CeleryExecutor, the Airflow config in the UI shows 64 for parallelism and dag_concurrency and I've restarted the Airflow scheduler, web server and workers numerous times (I'm actually testing this locally in a Vagrant machine, but have also tested in on an EC2 instance).
airflow.cfg
airflow.cfg
# The amount of parallelism as a setting to the executor. This defines # the max number of task instances that should run simultaneously # on this airflow installation parallelism = 64 # The number of task instances allowed to run concurrently by the scheduler dag_concurrency = 64示例DAG。我已经尝试在DAG中直接使用 concurrency 参数。
Example DAG. I've tried both without and with the concurrency argument directly in the DAG.
from datetime import datetime from airflow import DAG from airflow.operators.bash_operator import BashOperator dag = DAG( 'concurrency_dev', default_args={ 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2018, 1, 1), }, schedule_interval=None, catchup=False ) for i in range(0, 40): BashOperator( task_id='concurrency_dev_{i}'.format(i=i), bash_command='sleep 60', dag=dag )无论如何,只能同时执行32个任务。
Regardless, only 32 tasks are ever executed simultaneously.
推荐答案
如果您有2位工作人员且 celeryd_concurrency = 16 ,则您最多只能执行32个任务。如果 non_pooled_task_slot_count = 32 ,您也将受到限制。 当然,不仅要在Web服务器和调度程序上将 parallelism 和 dag_concurrency 设置为32以上。工人也是。
If you have 2 workers and celeryd_concurrency = 16 then you're limited to 32 tasks. If non_pooled_task_slot_count = 32 you'd also be limited. Of course parallelism and dag_concurrency need to be set above 32 on not only the webservers and schedulers, but the workers too.
更多推荐
在Apache Airflow中运行32个以上的并发任务
发布评论