几个小时后,Google Cloud DataFlow作业会发出警报

编程入门 行业动态 更新时间:2024-10-28 12:16:58
本文介绍了几个小时后,Google Cloud DataFlow作业会发出警报的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

使用2.11.0版本运行DataFlow流作业.几个小时后,我收到以下身份验证错误:

Running a DataFlow streaming job using 2.11.0 release. I get the following authentication error after few hours:

File "streaming_twitter.py", line 188, in <lambda> File "streaming_twitter.py", line 102, in estimate File "streaming_twitter.py", line 84, in estimate_aiplatform File "streaming_twitter.py", line 42, in get_service File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper return wrapped(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 227, in build credentials=credentials) File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper return wrapped(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 363, in build_from_document credentials = _auth.default_credentials() File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_auth.py", line 42, in default_credentials credentials, _ = google.auth.default() File "/usr/local/lib/python2.7/dist-packages/google/auth/_default.py", line 306, in default raise exceptions.DefaultCredentialsError(_HELP_MESSAGE) DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application.

此数据流作业对AI平台预测执行API请求似乎是身份验证令牌即将到期.

This Dataflow job performs an API request to AI Platform prediction and seems to be Authentication token is expiring.

代码段:

def get_service(): # If it hasn't been instantiated yet: do it now return discovery.build('ml', 'v1', discoveryServiceUrl=DISCOVERY_SERVICE, cache_discovery=True)

我尝试在服务功能中添加以下几行:

I tried adding the following lines to the service function:

os.environ[ "GOOGLE_APPLICATION_CREDENTIALS"] = "/tmp/key.json"

但是我得到了

DefaultCredentialsError: File "/tmp/key.json" was not found. [while running 'generatedPtransform-930']

我认为是因为文件不在DataFlow机器中.另一种选择是在构建方法中使用 developerKey 参数,但AI Platform预测似乎不支持该参数,但出现错误:

I assume because file is not in DataFlow machine. Other option is to use developerKey param in build method, but doesnt seems supported by AI Platform prediction, I get error:

Expected OAuth 2 access token, login cookie or other valid authentication credential. See developers.google/identity/sign-in/web/devconsole-project."> [while running 'generatedPtransform-22624']

要了解如何解决它以及最佳实践是什么?

Looking to understand how to fix it and what is the best practice?

有什么建议吗?

  • 完整日志此处
  • 完整代码此处
推荐答案

设置 os.environ ['GOOGLE_APPLICATION_CREDENTIALS'] ='/tmp/key.json'仅在DirectRunner本地运行.一旦部署到像Dataflow这样的分布式运行程序,每个工作人员将无法找到 local 文件/tmp/key.json .

Setting os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/tmp/key.json' only works locally with the DirectRunner. Once deploying to a distributed runner like Dataflow, each worker won't be able to find the local file /tmp/key.json.

如果希望每个工作人员使用一个特定的服务帐户,则可以告诉Beam使用哪个服务帐户来标识工作人员.

If you want each worker to use a specific service account, you can tell Beam which service account to use to identify workers.

首先,授予 roles/dataflow.要您的工作人员使用的服务帐户中的工作人员角色.无需下载服务帐户密钥文件:)

First, grant the roles/dataflow.worker role to the service account you want your workers to use. There is no need to download the service account key file :)

然后,如果要让 PipelineOptions 解析命令行参数,则只需使用 service_account_email 选项,并指定为-service_account_email your-email @ your-运行管道时使用project.iam.gserviceaccount .

Then if you're letting PipelineOptions parse your command line arguments, you can simply use the service_account_email option, and specify it like --service_account_email your-email@your-project.iam.gserviceaccount when running your pipeline.

您的 GOOGLE_APPLICATION_CREDENTIALS 指向的服务帐户仅用于开始作业,但是每个工作人员都使用 service_account_email 指定的服务帐户.如果未传递 service_account_email ,则默认为来自 GOOGLE_APPLICATION_CREDENTIALS 文件的电子邮件.

The service account pointed by your GOOGLE_APPLICATION_CREDENTIALS is simply used to start the job, but each worker uses the service account specified by the service_account_email. If a service_account_email is not passed, it defaults to the email from your GOOGLE_APPLICATION_CREDENTIALS file.

更多推荐

几个小时后,Google Cloud DataFlow作业会发出警报

本文发布于:2023-11-27 02:04:59,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1636121.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:几个   作业   警报   小时后   Cloud

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!