k8s node NotReady|电子爱好者

admin管理员组
文章数量:1612097

目录标题

- NotReady
- Case @1 may retry after sleeping
- Case @2 etcdserver: request timed out
- Case @ 3 container runtime status check may not have completed yet.
- Case @4 invalid capacity 0 on image filesystem
- Case @5 时间不一致
- Case @6 Error updating node status, will retry
- Case @7 dockerd 报错too many open files
- Case@8 failed to update lease
- Case@9 System OOM encountered, victim process: prometheus
- Case@10 PLEG is not healthy: pleg was last seen active 3m7.377520804s ago; threshold is 3m0s
- Case@11 NodeStatusUnknown

NotReady

如果 Kubernetes 节点状态为 NotReady，这意味着节点无法执行其预定功能，并且不可用于调度新的工作负载。节点可能变为 NotReady 的原因有多种，包括：

网络问题：如果节点失去网络连接，它可能变为 NotReady。 ip -4 a
资源耗尽：如果节点耗尽了 CPU 或内存等资源，它可能变为 NotReady。 free -h top df -h
节点故障：如果节点崩溃或经历硬件故障，它可能变为 NotReady。
Kubernetes 组件故障：如果节点上的关键 Kubernetes 组件（如 kubelet 或 kube-proxy）发生故障，节点可能变为 NotReady。
节点是否发生重启 uptime ，对比集群节点时间 date

要排查 NotReady 节点，您可以：

使用 kubectl 命令检查节点状态：kubectl get nodes
使用 kubectl 命令检查节点日志：kubectl describe <node-name>，并将输出重定向到文件：kubectl describe node > node.log
在节点上检查 kubelet 日志：journalctl -xeu kubelet，并将输出重定向到文件：journalctl -xu kubelet > kubelet.log；使用 systemctl 查看 kubelet 状态：systemctl status kubelet
检查节点上的系统日志：/var/log/messages
使用 kubectl 命令检查 Kubernetes 事件日志：kubectl get events
检查 docker 状态：systemctl status docker && systemctl status containerd，使用管道查看更多信息：| more
在节点上检查 docker 日志：journalctl -xeu docker，并将输出重定向到文件：journalctl -xu docker > docker.log
检查磁盘空间：df -h | head
检查服务：docker ps | grep -E ‘api|etcd’

一旦确定了节点 NotReady 的根本原因，您可以采取适当的措施来解决问题。例如，如果节点由于资源耗尽而变为 NotReady，您可以为节点添加更多资源或调整运行在节点上的工作负载的资源限制。如果节点由于网络问题而变为 NotReady，您可以排查网络连接并修复任何问题。

/var/log/pods
/var/log/containers
crictl ps + crictl logs
docker ps + docker logs (in case when Docker is used)
kubelet logs: /var/log/syslog or journalctl

Case @1 may retry after sleeping

Issues

处理方案：

systemctl restart kubelet

Case @2 etcdserver: request timed out

controller.go:178] failed to update node lease, error: etcdserver: request timed out

request timed out

Case @ 3 container runtime status check may not have completed yet.

https://github/kubernetes/kubernetes/issues/101056

原因：
容器把docker搞崩了，然后节点就异常了！

处理方案：
for p in $(docker ps -q); do echo inspecting $p; docker inspect $p; echo complete; done;
ps -ef | grep 最后的id
kill 掉该进程

Case @4 invalid capacity 0 on image filesystem


Warning  InvalidDiskCapacity      101s                 kubelet          invalid capacity 0 on image filesystem

解决方案：systemctl restart containerd

Case @5 时间不一致

/var/log/messages systemd : time has been changed

原因：该节点时间和其他节点时间不一致导致。

Case @6 Error updating node status, will retry

解决方案：重启kubelet

kubelet 上报节点状态源码分析：https://zhuanlan.zhihu/p/623840699

Case @7 dockerd 报错too many open files

https://blog.csdn/whatday/article/details/125481727

ls > /tmp/proc.log
cat /tmp/proc.log | wc -l

ll /proc/pid/fd/ 能看一些，详细的可以使用lsof 命令

Case@8 failed to update lease

E0522 00:38:34.444337 1421 controller.go:187] failed to update lease, error: Put “https://k8smaster.qfusion.irds:60443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/?timeout=10s”: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

查看组件状态
ks get pod | grep etcd
ks get pod | grep apiserver

收集组件日志
ks logs --since “72h” etcd-qfusion01
ks logs --since “72h” kube-apiserver-qfusion01

Leases
Node heartbeats

Case@9 System OOM encountered, victim process: prometheus

原因：内存耗尽

排查：free -h

解决方案1：释放内存（或者重启宿主机）

# 释放pagecache
echo 1 > /proc/sys/vm/drop_caches
 
# 释放dentries和inodes
echo 2 > /proc/sys/vm/drop_caches
 
# 释放pagecache、dentries和inodes
echo 3 > /proc/sys/vm/drop_caches

扩展：驱逐

驱逐信号描述
memory.available memory.available := node.status.capacity[memory] - node.stats.memory.workingSet

memory.available 的值来自 cgroupfs，而不是像 free -m 这样的工具。这很重要，因为 free -m 在容器中不起作用，如果用户使用节点可分配资源这一功能特性，资源不足的判定是基于 cgroup 层次结构中的用户 Pod 所处的局部及 cgroup 根节点作出的。这个脚本或者 cgroupv2 脚本重现了 kubelet 为计算 memory.available 而执行的相同步骤。 kubelet 在其计算中排除了 inactive_file（非活动 LRU 列表上基于文件来虚拟的内存的字节数），因为它假定在压力下内存是可回收的。

例如，如果一个节点的总内存为 10GiB 并且你希望在可用内存低于 1GiB 时触发驱逐，则可以将驱逐条件定义为 memory.available<10% 或 memory.available< 1G（你不能同时使用二者）。

eviction-soft：一组驱逐条件，如 memory.available<1.5Gi，如果驱逐条件持续时长超过指定的宽限期，可以触发 Pod 驱逐。
eviction-soft-grace-period：一组驱逐宽限期，如 memory.available=1m30s，定义软驱逐条件在触发 Pod 驱逐之前必须保持多长时间。

Case@10 PLEG is not healthy: pleg was last seen active 3m7.377520804s ago; threshold is 3m0s

https://github/kubernetes/kubernetes/issues/45419

Case@11 NodeStatusUnknown

重启docker和kubelet后正常。

本文标签： K8s node NotReady

版权声明：本文标题：k8s node NotReady 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://www.elefans.com/dongtai/1728622487a1166531.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

xp系统

电子爱好者 - 最新技术资讯及电子产品介绍！

k8s node NotReady

目录标题

NotReady

Case @1 may retry after sleeping

Case @2 etcdserver: request timed out

Case @ 3 container runtime status check may not have completed yet.

Case @4 invalid capacity 0 on image filesystem

Case @5 时间不一致

Case @6 Error updating node status, will retry

Case @7 dockerd 报错too many open files

Case@8 failed to update lease

Case@9 System OOM encountered, victim process: prometheus

Case@10 PLEG is not healthy: pleg was last seen active 3m7.377520804s ago; threshold is 3m0s

Case@11 NodeStatusUnknown

更多相关文章

k8s高可用集群(二进制, v1.18版本)

2024年最新稳定版 node.js下载安装配置教程

node 升级到最新稳定版本

【全文】狼叔：如何正确的学习Node.js

linux提示welcom to emergency node

C - Least Crucial Node -模拟割点

K8S通过Ansible安装集群

k8s node NotReady-报错 error updating CSINode annotation: timed out waiting for the condition解决

阿里云K8s部署详细教程

Node-RED 物联网应用开发：十大特性

零代码or低代码 | 从IT到IoT，Node-RED引领物联网开发新潮流

【包邮送书】Node-RED 物联网应用开发技术详解

对 K8s Pod 安全有多少认识？

（2022版）一套教程搞定k8s安装到实战 | Kubernetes基础

K8S（四）—pod详解

K8S集群安全升级(CIS CNI Calico)

36.云原生之SpringCloud+k8s实践

k8s node NotReady

Run `npm rebuild node-sass` to download the binding for your current environment.

报错Error: Node Sass does not yet support your current environment

发表评论

推荐文章

【Python八股文系列】：100个Python的面试笔试高频考点

Excel怎么制作三斜线表头

‘utf-8‘ codec can‘t decode byte 0xd5 in position 116

Hadoop学习笔记之CapacityScheduler队列管理

解决 Minimum supported Gradle version is 5.1.1. Current version is 4.6 以及最终apk打包输出

热门文章

App-V软件排序参考之（二）：Office 2007英文版+多国语言包 (1)

手机摄像头技术

华为充电显示android 是怎么回事,华为手机充电口太“特殊”！看看这个你就明白区别在哪了...

Tomcat报错：Error parsing HTTP request header Note: further occurrences of HTTP header parsing errors

【android免root脚本制作】基于控件的操作——auto.js进阶

position：absolute绝对位置的正确使用

解决UnicodeDecodeError: ‘ascii‘ codec can‘t decode byte 0xe5 in position 108: ordinal not in range(128

UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0xd4 in position 3: invalid continuation byte

k8s pod NotReady | invalid capacity 0 on image filesystem

微服务项目启动出现com.alibaba.nacos.api.exception.NacosException: Client not connected, current status:STARTI

最新文章

Photoshop期末大作业

Photoshop2024下载安装包（附安装教程）

Window系统查询电脑生产日期、系统信息、主机序列号和真实配置信息

计算机显示文档在哪打开,我的文档在哪?小编教你找到电脑Windows系统我的文档在哪...

修改Windows“区域设置”对系统的影响

Photoshop 2021正式版更新，附全系列下载

Photoshop下载

windows默认编码格式修改

Windows10电脑用户中文名修改为英文名的详细教程【亲测有用】

windows桌面怎么添加计算机,Windows桌面添加我的电脑

Photoshop零基础全套学习教程资源百度云分享

windows+中标麒麟双系统启动优先顺序调整方法

如何判断自己的电脑有没有装java(windows7版本)

三步解决VMware “这台电脑无法运行 Windows 11”

windows server 2016桌面添加 此电脑 我的电脑 计算机 图标

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

windows server 2016桌面添加此电脑我的电脑计算机图标

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载