在DAG中使用boto3时,Apache airflow无法找到AWS凭证

编程入门 行业动态 更新时间:2024-10-11 07:31:20
本文介绍了在DAG中使用boto3时,Apache airflow无法找到AWS凭证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我们正在使用ECS Fargate迁移到Apache Airflow.

We are migrating to Apache Airflow using ECS Fargate.

我们面临的问题很简单.我们有一个简单的DAG,其任务之一是与AWS中的某些外部服务进行通信(例如,从S3下载文件).这是DAG的脚本:

The problem we are facing, it's simple. We have a simple DAG that one of its tasks is to communicate with some external service in AWS (let's say, download a file from S3). This is the script of the DAG:

from airflow import DAG from airflow.operators.bash_operator import BashOperator from airflow.operators.python_operator import PythonOperator from datetime import datetime, timedelta # default arguments for each task default_args = { 'owner': 'thomas', 'depends_on_past': False, 'start_date': datetime(2015, 6, 1), 'retries': 1, 'retry_delay': timedelta(minutes=1), } dag = DAG('test_s3_download', default_args=default_args, schedule_interval=None) TEST_BUCKET = 'bucket-dev' TEST_KEY = 'BlueMetric/dms.json' # simple download task def download_file(bucket, key): import boto3 s3 = boto3.resource('s3') print(s3.Object(bucket, key).get()['Body'].read()) download_from_s3 = PythonOperator( task_id='download_from_s3', python_callable=download_file, op_kwargs={'bucket': TEST_BUCKET, 'key': TEST_KEY}, dag=dag) sleep_task = BashOperator( task_id='sleep_for_1', bash_command='sleep 1', dag=dag) download_from_s3.set_downstream(sleep_task)

就像其他时候使用docker一样,我们在docker容器中的〜/.aws 中创建 config 文件,该文件的内容为:

As we have done other times when using docker, we create within the docker container, in ~/.aws the config file that reads:

[default] region = eu-west-1

并且只要容器在AWS边界之内,它就可以解决每个请求,而无需指定凭证.

and as long as the container is within the AWS boundaries, it'll resolve every request without any need to specify credentials.

这是我们正在使用的 Dockerfile :

FROM puckel/docker-airflow:1.10.7 USER root COPY entrypoint.sh /entrypoint.sh COPY requirements.txt /requirements.txt RUN apt-get update RUN ["chmod", "+x", "/entrypoint.sh"] RUN mkdir -p /home/airflow/.aws \ && touch /home/airflow/.aws/config \ && echo '[default]' > /home/airflow/.aws/config \ && echo 'region = eu-west-1' >> /home/airflow/.aws/config RUN ["chown", "-R", "airflow", "/home/airflow"] USER airflow ENTRYPOINT ["/entrypoint.sh"] # # Expose webUI and flower respectively EXPOSE 8080 EXPOSE 5555

,所有内容都像魅力一样.目录和所有者的更改已成功完成,但是在运行DAG时失败,提示:

and everything works like a charm. Directory and change of owner are done successfully but when running the DAG, it fails saying:

... ... File "/usr/local/airflow/.local/lib/python3.7/site-packages/botocore/signers.py", line 160, in sign auth.add_auth(request) File "/usr/local/airflow/.local/lib/python3.7/site-packages/botocore/auth.py", line 357, in add_auth raise NoCredentialsError botocore.exceptions.NoCredentialsError: Unable to locate credentials [2020-08-24 11:15:02,125] {{taskinstance.py:1117}} INFO - All retries failed; marking task as FAILED

因此,我们认为Airflow的工作节点确实使用了另一个用户.

So we are thinking that the worker node of Airflow does use another user.

你们中有人知道发生了什么吗?感谢您提供的任何建议/建议.

Does any of you know what's going on? Thank you for any advice/light you can provide.

推荐答案

为任务定义创建正确的 task_role_arn .此角色是容器内部触发的进程承担的角色.另一个注释是该错误不应读取:

Create a proper task_role_arn for the task definition. This role is the one assumed by the processes triggered inside the container. Another annotation is that the error should not read:

无法找到凭据

访问被拒绝:您无权访问s3:GetObject .

更多推荐

在DAG中使用boto3时,Apache airflow无法找到AWS凭证

本文发布于:2023-11-23 20:09:14,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1622737.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:凭证   DAG   Apache   AWS   airflow

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!