pod挂起待定状态(pod hangs in Pending state)

我有一个kubernetes部署，我试图在单个节点上的单个pod中运行5个docker容器。容器挂起处于“挂起”状态，从未安排。我不介意运行超过1个pod，但我想保持节点数量下降。我假设1个节点有1个CPU，1.7G RAM足够用于5个容器，我试图将工作负载分开。

最初我得出的结论是我的资源不足。我启用了生成以下内容的节点的自动调节（请参阅kubectl describe pod命令）：

pod没有触发放大（如果添加了新节点，则不适合）

无论如何，每个docker容器都有一个简单的命令，它运行一个相当简单的应用程序。理想情况下，我不想处理设置CPU和RAM资源分配，但即使设置CPU /内存限制在边界内，所以它们不会加起来> 1，我仍然得到（参见kubectl describe po / test- 529945953-gh6cl）我明白了：

没有可用的节点匹配以下所有谓词:: cpu（1）不足，内存不足（1）。

以下是显示状态的各种命令。对我所做错的任何帮助都将不胜感激。

kubectl得到所有

user_s@testing-11111:~/gce$ kubectl get all NAME READY STATUS RESTARTS AGE po/test-529945953-gh6cl 0/5 Pending 0 34m NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE svc/kubernetes 10.7.240.1 <none> 443/TCP 19d NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deploy/test 1 1 1 0 34m NAME DESIRED CURRENT READY AGE rs/test-529945953 1 1 0 34m user_s@testing-11111:~/gce$

kubectl描述po / test-529945953-gh6cl

user_s@testing-11111:~/gce$ kubectl describe po/test-529945953-gh6cl Name: test-529945953-gh6cl Namespace: default Node: <none> Labels: app=test pod-template-hash=529945953 Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"test-529945953","uid":"c6e889cb-a2a0-11e7-ac18-42010a9a001a"... Status: Pending IP: Created By: ReplicaSet/test-529945953 Controlled By: ReplicaSet/test-529945953 Containers: container-test2-tickers: Image: gcr.io/testing-11111/testology:latest Port: <none> Command: process_cmd arg1 test2 Limits: cpu: 150m memory: 375Mi Requests: cpu: 100m memory: 375Mi Environment: DB_HOST: 127.0.0.1:5432 DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro) container-kraken-tickers: Image: gcr.io/testing-11111/testology:latest Port: <none> Command: process_cmd arg1 arg2 Limits: cpu: 150m memory: 375Mi Requests: cpu: 100m memory: 375Mi Environment: DB_HOST: 127.0.0.1:5432 DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro) container-gdax-tickers: Image: gcr.io/testing-11111/testology:latest Port: <none> Command: process_cmd arg1 arg2 Limits: cpu: 150m memory: 375Mi Requests: cpu: 100m memory: 375Mi Environment: DB_HOST: 127.0.0.1:5432 DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro) container-bittrex-tickers: Image: gcr.io/testing-11111/testology:latest Port: <none> Command: process_cmd arg1 arg2 Limits: cpu: 150m memory: 375Mi Requests: cpu: 100m memory: 375Mi Environment: DB_HOST: 127.0.0.1:5432 DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro) cloudsql-proxy: Image: gcr.io/cloudsql-docker/gce-proxy:1.09 Port: <none> Command: /cloud_sql_proxy --dir=/cloudsql -instances=testing-11111:europe-west2:testology=tcp:5432 -credential_file=/secrets/cloudsql/credentials.json Limits: cpu: 150m memory: 375Mi Requests: cpu: 100m memory: 375Mi Environment: <none> Mounts: /cloudsql from cloudsql (rw) /etc/ssl/certs from ssl-certs (rw) /secrets/cloudsql from cloudsql-instance-credentials (ro) /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro) Conditions: Type Status PodScheduled False Volumes: cloudsql-instance-credentials: Type: Secret (a volume populated by a Secret) SecretName: cloudsql-instance-credentials Optional: false ssl-certs: Type: HostPath (bare host directory volume) Path: /etc/ssl/certs cloudsql: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: default-token-b2mxc: Type: Secret (a volume populated by a Secret) SecretName: default-token-b2mxc Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s node.alpha.kubernetes.io/unreachable:NoExecute for 300s Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 27m 17m 44 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (2). 26m 8s 150 cluster-autoscaler Normal NotTriggerScaleUp pod didn't trigger scale-up (it wouldn't fit if a new node is added) 16m 2s 63 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (1). user_s@testing-11111:~/gce$ > Blockquote

kubectl获取节点

user_s@testing-11111:~/gce$ kubectl get nodes NAME STATUS AGE VERSION gke-test-default-pool-abdf83f7-p4zw Ready 9h v1.6.7

kubectl获得pods

user_s@testing-11111:~/gce$ kubectl get pods NAME READY STATUS RESTARTS AGE test-529945953-gh6cl 0/5 Pending 0 38m

kubectl描述节点

user_s@testing-11111:~/gce$ kubectl describe nodes Name: gke-test-default-pool-abdf83f7-p4zw Role: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/fluentd-ds-ready=true beta.kubernetes.io/instance-type=g1-small beta.kubernetes.io/os=linux cloud.google.com/gke-nodepool=default-pool failure-domain.beta.kubernetes.io/region=europe-west2 failure-domain.beta.kubernetes.io/zone=europe-west2-c kubernetes.io/hostname=gke-test-default-pool-abdf83f7-p4zw Annotations: node.alpha.kubernetes.io/ttl=0 volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Tue, 26 Sep 2017 02:05:45 +0100 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable False Tue, 26 Sep 2017 02:06:05 +0100 Tue, 26 Sep 2017 02:06:05 +0100 RouteCreated RouteController created a route OutOfDisk False Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KubeletHasSufficientDisk kubelet has sufficient disk space available MemoryPressure False Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure Ready True Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:06:05 +0100 KubeletReady kubelet is posting ready status. AppArmor enabled KernelDeadlock False Tue, 26 Sep 2017 11:33:12 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KernelHasNoDeadlock kernel has no deadlock Addresses: InternalIP: 10.154.0.2 ExternalIP: 35.197.217.1 Hostname: gke-test-default-pool-abdf83f7-p4zw Capacity: cpu: 1 memory: 1742968Ki pods: 110 Allocatable: cpu: 1 memory: 1742968Ki pods: 110 System Info: Machine ID: e6119abf844c564193495c64fd9bd341 System UUID: E6119ABF-844C-5641-9349-5C64FD9BD341 Boot ID: 1c2f2ea0-1f5b-4c90-9e14-d1d9d7b75221 Kernel Version: 4.4.52+ OS Image: Container-Optimized OS from Google Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.11.2 Kubelet Version: v1.6.7 Kube-Proxy Version: v1.6.7 PodCIDR: 10.4.1.0/24 ExternalID: 6073438913956157854 Non-terminated Pods: (7 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- kube-system fluentd-gcp-v2.0-k565g 100m (10%) 0 (0%) 200Mi (11%) 300Mi (17%) kube-system heapster-v1.3.0-3440173064-1ztvw 138m (13%) 138m (13%) 301456Ki (17%) 301456Ki (17%) kube-system kube-dns-1829567597-gdz52 260m (26%) 0 (0%) 110Mi (6%) 170Mi (9%) kube-system kube-dns-autoscaler-2501648610-7q9dd 20m (2%) 0 (0%) 10Mi (0%) 0 (0%) kube-system kube-proxy-gke-test-default-pool-abdf83f7-p4zw 100m (10%) 0 (0%) 0 (0%) 0 (0%) kube-system kubernetes-dashboard-490794276-25hmn 100m (10%) 100m (10%) 50Mi (2%) 50Mi (2%) kube-system l7-default-backend-3574702981-flqck 10m (1%) 10m (1%) 20Mi (1%) 20Mi (1%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 728m (72%) 248m (24%) 700816Ki (40%) 854416Ki (49%) Events: <none>

I have a kubernetes deployment in which I am trying to run 5 docker containers inside a single pod on a single node. The containers hang in "Pending" state and are never scheduled. I do not mind running more than 1 pod but I'd like to keep the number of nodes down. I have assumed 1 node with 1 CPU and 1.7G RAM will be enough for the 5 containers and I have attempted to split the workload across.

Initially I came to the conclusion that I have insufficient resources. I enabled autoscaling of nodes which produced the following (see kubectl describe pod command):

pod didn't trigger scale-up (it wouldn't fit if a new node is added)

Anyway, each docker container has a simple command which runs a fairly simple app. Ideally I wouldn't like to have to deal with setting CPU and RAM allocation of resources but even setting the CPU/mem limits within bounds so they don't add up to > 1, I still get (see kubectl describe po/test-529945953-gh6cl) I get this:

No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (1).

Below are various commands that show the state. Any help on what I'm doing wrong will be appreciated.

kubectl get all

user_s@testing-11111:~/gce$ kubectl get all NAME READY STATUS RESTARTS AGE po/test-529945953-gh6cl 0/5 Pending 0 34m NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE svc/kubernetes 10.7.240.1 <none> 443/TCP 19d NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deploy/test 1 1 1 0 34m NAME DESIRED CURRENT READY AGE rs/test-529945953 1 1 0 34m user_s@testing-11111:~/gce$

kubectl describe po/test-529945953-gh6cl

user_s@testing-11111:~/gce$ kubectl describe po/test-529945953-gh6cl Name: test-529945953-gh6cl Namespace: default Node: <none> Labels: app=test pod-template-hash=529945953 Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"test-529945953","uid":"c6e889cb-a2a0-11e7-ac18-42010a9a001a"... Status: Pending IP: Created By: ReplicaSet/test-529945953 Controlled By: ReplicaSet/test-529945953 Containers: container-test2-tickers: Image: gcr.io/testing-11111/testology:latest Port: <none> Command: process_cmd arg1 test2 Limits: cpu: 150m memory: 375Mi Requests: cpu: 100m memory: 375Mi Environment: DB_HOST: 127.0.0.1:5432 DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro) container-kraken-tickers: Image: gcr.io/testing-11111/testology:latest Port: <none> Command: process_cmd arg1 arg2 Limits: cpu: 150m memory: 375Mi Requests: cpu: 100m memory: 375Mi Environment: DB_HOST: 127.0.0.1:5432 DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro) container-gdax-tickers: Image: gcr.io/testing-11111/testology:latest Port: <none> Command: process_cmd arg1 arg2 Limits: cpu: 150m memory: 375Mi Requests: cpu: 100m memory: 375Mi Environment: DB_HOST: 127.0.0.1:5432 DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro) container-bittrex-tickers: Image: gcr.io/testing-11111/testology:latest Port: <none> Command: process_cmd arg1 arg2 Limits: cpu: 150m memory: 375Mi Requests: cpu: 100m memory: 375Mi Environment: DB_HOST: 127.0.0.1:5432 DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro) cloudsql-proxy: Image: gcr.io/cloudsql-docker/gce-proxy:1.09 Port: <none> Command: /cloud_sql_proxy --dir=/cloudsql -instances=testing-11111:europe-west2:testology=tcp:5432 -credential_file=/secrets/cloudsql/credentials.json Limits: cpu: 150m memory: 375Mi Requests: cpu: 100m memory: 375Mi Environment: <none> Mounts: /cloudsql from cloudsql (rw) /etc/ssl/certs from ssl-certs (rw) /secrets/cloudsql from cloudsql-instance-credentials (ro) /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro) Conditions: Type Status PodScheduled False Volumes: cloudsql-instance-credentials: Type: Secret (a volume populated by a Secret) SecretName: cloudsql-instance-credentials Optional: false ssl-certs: Type: HostPath (bare host directory volume) Path: /etc/ssl/certs cloudsql: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: default-token-b2mxc: Type: Secret (a volume populated by a Secret) SecretName: default-token-b2mxc Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s node.alpha.kubernetes.io/unreachable:NoExecute for 300s Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 27m 17m 44 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (2). 26m 8s 150 cluster-autoscaler Normal NotTriggerScaleUp pod didn't trigger scale-up (it wouldn't fit if a new node is added) 16m 2s 63 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (1). user_s@testing-11111:~/gce$ > Blockquote

kubectl get nodes

user_s@testing-11111:~/gce$ kubectl get nodes NAME STATUS AGE VERSION gke-test-default-pool-abdf83f7-p4zw Ready 9h v1.6.7

kubectl get pods

user_s@testing-11111:~/gce$ kubectl get pods NAME READY STATUS RESTARTS AGE test-529945953-gh6cl 0/5 Pending 0 38m

kubectl describe nodes

user_s@testing-11111:~/gce$ kubectl describe nodes Name: gke-test-default-pool-abdf83f7-p4zw Role: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/fluentd-ds-ready=true beta.kubernetes.io/instance-type=g1-small beta.kubernetes.io/os=linux cloud.google.com/gke-nodepool=default-pool failure-domain.beta.kubernetes.io/region=europe-west2 failure-domain.beta.kubernetes.io/zone=europe-west2-c kubernetes.io/hostname=gke-test-default-pool-abdf83f7-p4zw Annotations: node.alpha.kubernetes.io/ttl=0 volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Tue, 26 Sep 2017 02:05:45 +0100 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable False Tue, 26 Sep 2017 02:06:05 +0100 Tue, 26 Sep 2017 02:06:05 +0100 RouteCreated RouteController created a route OutOfDisk False Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KubeletHasSufficientDisk kubelet has sufficient disk space available MemoryPressure False Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure Ready True Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:06:05 +0100 KubeletReady kubelet is posting ready status. AppArmor enabled KernelDeadlock False Tue, 26 Sep 2017 11:33:12 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KernelHasNoDeadlock kernel has no deadlock Addresses: InternalIP: 10.154.0.2 ExternalIP: 35.197.217.1 Hostname: gke-test-default-pool-abdf83f7-p4zw Capacity: cpu: 1 memory: 1742968Ki pods: 110 Allocatable: cpu: 1 memory: 1742968Ki pods: 110 System Info: Machine ID: e6119abf844c564193495c64fd9bd341 System UUID: E6119ABF-844C-5641-9349-5C64FD9BD341 Boot ID: 1c2f2ea0-1f5b-4c90-9e14-d1d9d7b75221 Kernel Version: 4.4.52+ OS Image: Container-Optimized OS from Google Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.11.2 Kubelet Version: v1.6.7 Kube-Proxy Version: v1.6.7 PodCIDR: 10.4.1.0/24 ExternalID: 6073438913956157854 Non-terminated Pods: (7 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- kube-system fluentd-gcp-v2.0-k565g 100m (10%) 0 (0%) 200Mi (11%) 300Mi (17%) kube-system heapster-v1.3.0-3440173064-1ztvw 138m (13%) 138m (13%) 301456Ki (17%) 301456Ki (17%) kube-system kube-dns-1829567597-gdz52 260m (26%) 0 (0%) 110Mi (6%) 170Mi (9%) kube-system kube-dns-autoscaler-2501648610-7q9dd 20m (2%) 0 (0%) 10Mi (0%) 0 (0%) kube-system kube-proxy-gke-test-default-pool-abdf83f7-p4zw 100m (10%) 0 (0%) 0 (0%) 0 (0%) kube-system kubernetes-dashboard-490794276-25hmn 100m (10%) 100m (10%) 50Mi (2%) 50Mi (2%) kube-system l7-default-backend-3574702981-flqck 10m (1%) 10m (1%) 20Mi (1%) 20Mi (1%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 728m (72%) 248m (24%) 700816Ki (40%) 854416Ki (49%) Events: <none>

最满意答案

正如您在“ Allocated resources:下的kubectl describe nodes命令的输出中所看到的，在kubectl describe nodes上的kube-system命名空间中运行的kube-system已经请求了728m (72%) CPU和700816Ki (40%)内存。测试Pod的资源请求总和都超过了节点上可用的剩余CPU和内存，正如您在kubectl describe po/[…]命令的Events下可以看到的kubectl describe po/[…] 。

如果要将所有容器保留在单个容器中，则需要减少容器的资源请求，或者在具有更多CPU和内存的节点上运行它们。更好的解决方案是将应用程序拆分为多个pod，这样可以在多个节点上进行分发。

As you can see in the output of your kubectl describe nodes command under Allocated resources:, there is 728m (72%) CPU and 700816Ki (40%) Memory already requested by Pods running in the kube-system namespace on the node. The sum of resource requests of your test Pod both exceeds the remaining CPU and Memory available on your node, as you can see under Events of your kubectl describe po/[…] command.

If you want to keep all containers in a single pod, you need to reduce the resource requests of your containers or run them on a node with more CPU and Memory. The better solution would be to split your application in multiple pods, this enables distribution over multiple nodes.

更多推荐

pod挂起待定状态(pod hangs in Pending state)

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表