使用 Google Container Engine (GKE) 和 Stackdriver 监控 Pod 状态并发出警报或重新启动

编程入门 行业动态 更新时间:2024-10-25 04:19:05
本文介绍了使用 Google Container Engine (GKE) 和 Stackdriver 监控 Pod 状态并发出警报或重新启动的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

有没有办法使用 Stackdriver 监控 Pod 状态并重新启动在 GKE 集群中运行的 Pod 计数?

Is there a way to monitor the pod status and restart count of pods running in a GKE cluster with Stackdriver?

虽然我可以在 Stackdriver 中看到所有 Pod 的 CPU、内存和磁盘使用指标,但似乎无法获取有关崩溃的 Pod 或副本集中的 Pod 因崩溃而重新启动的指标.

While I can see CPU, memory and disk usage metrics for all pods in Stackdriver there seems to be no way of getting metrics about crashing pods or pods in a replica set being restarted due to crashes.

我使用 Kubernetes 副本集来管理 Pod,因此它们会在崩溃时重新生成并使用新名称创建.据我所知,Stackdriver 中的指标按 pod-name(在 pod 的生命周期内是唯一的)出现,这听起来并不合理.

I'm using a Kubernetes replica set to manage the pods, hence they are respawned and created with a new name when they crash. As far as I can tell the metrics in Stackdriver appear by pod-name (which is unique for the lifetime of the pod) which doesn't sound really sensible.

在 pod 故障时发出警报听起来很自然,听起来很难相信目前不支持此功能.我从 Stackdriver 获得的用于 Google 容器引擎的监控和警报功能目前似乎毫无用处,因为它们都绑定到生命周期可能非常短的 Pod.

Alerting upon pod failures sounds like such a natural thing that it sounds hard to believe that this is not supported at the moment. The monitoring and alerting capabilities that I get from Stackdriver for Google Container Engine as they stand seem to be rather useless as they are all bound to pods whose lifetime can be very short.

因此,如果这不是开箱即用的,是否有已知的解决方法或最佳实践来监控持续崩溃的 Pod?

So if this doesn't work out of the box are there known workarounds or best practices on how to monitor for continuously crashing pods?

推荐答案

现在有一个内置指标,因此无需设置自定义指标即可轻松实现仪表板和/或警报

There is a built in metric now, so it's easy to dashboard and/or alert on it without setting up custom metrics

Metric: kubernetes.io/container/restart_count Resource type: k8s_container

更多推荐

使用 Google Container Engine (GKE) 和 Stackdriver 监控 Pod 状态并发出警报或重新启动

本文发布于:2023-11-27 18:35:41,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1639135.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:警报   重新启动   状态   Engine   Container

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!