Google Cloud上的Kubernetes 1.7:FailedSync同步错误窗格,SandboxChanged Pod沙盒已更改,将被终止并重新创build

我的Kubernetes豆荚和容器没有启动。 他们被困在状态ContainerCreating

我运行命令kubectl describe po PODNAME ,其中列出了事件,我看到以下错误:

 Type Reason Message Warning FailedSync Error syncing pod Normal SandboxChanged Pod sandbox changed, it will be killed and re-created. 

Count列表示这些错误一遍又一遍地重复,大概每秒一次。 下面是从这个命令下面的完整输出,但是我怎么去debugging呢? 我甚至不知道这些错误是什么意思。

 Name: ocr-extra-2939512459-3hkv1 Namespace: ocr-da-cluster Node: gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2/10.240.0.11 Start Time: Tue, 24 Oct 2017 21:05:01 -0400 Labels: component=ocr pod-template-hash=2939512459 role=extra Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"ocr-da-cluster","name":"ocr-extra-2939512459","uid":"d58bd050-b8f3-11e7-9f9e-4201... Status: Pending IP: Created By: ReplicaSet/ocr-extra-2939512459 Controlled By: ReplicaSet/ocr-extra-2939512459 Containers: ocr-node: Container ID: Image: us.gcr.io/ocr-api/ocr-image Image ID: Ports: 80/TCP, 443/TCP, 5555/TCP, 15672/TCP, 25672/TCP, 4369/TCP, 11211/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 31 memory: 10Gi Liveness: http-get http://:http/ocr/live delay=270s timeout=30s period=60s #success=1 #failure=5 Readiness: http-get http://:http/_ah/warmup delay=180s timeout=60s period=120s #success=1 #failure=3 Environment: NAMESPACE: ocr-da-cluster (v1:metadata.namespace) Mounts: /var/log/apache2 from apachelog (rw) /var/log/celery from cellog (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-dhjr5 (ro) log-apache2-error: Container ID: Image: busybox Image ID: Port: <none> Args: /bin/sh -c echo Apache2 Error && sleep 90 && tail -n+1 -F /var/log/apache2/error.log State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 20m Environment: <none> Mounts: /var/log/apache2 from apachelog (ro) /var/run/secrets/kubernetes.io/serviceaccount from default-token-dhjr5 (ro) log-worker-1: Container ID: Image: busybox Image ID: Port: <none> Args: /bin/sh -c echo Celery Worker && sleep 90 && tail -n+1 -F /var/log/celery/worker*.log State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 20m Environment: <none> Mounts: /var/log/celery from cellog (ro) /var/run/secrets/kubernetes.io/serviceaccount from default-token-dhjr5 (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: apachelog: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: cellog: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: default-token-dhjr5: Type: Secret (a volume populated by a Secret) SecretName: default-token-dhjr5 Optional: false QoS Class: Burstable Node-Selectors: beta.kubernetes.io/instance-type=n1-highcpu-32 Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s node.alpha.kubernetes.io/unreachable:NoExecute for 300s Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 10m 10m 2 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (10), Insufficient memory (2), MatchNodeSelector (2). 10m 10m 1 default-scheduler Normal Scheduled Successfully assigned ocr-extra-2939512459-3hkv1 to gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 10m 10m 1 kubelet, gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "apachelog" 10m 10m 1 kubelet, gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "cellog" 10m 10m 1 kubelet, gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-dhjr5" 10m 1s 382 kubelet, gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 Warning FailedSync Error syncing pod 10m 0s 382 kubelet, gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 Normal SandboxChanged Pod sandbox changed, it will be killed and re-created. 

检查您的资源限制。 我面临同样的问题,我的理由是因为我使用m而不是Mi来限制内存和内存请求。

你确定你需要31个CPU作为初始请求(ocr-node)吗?
这将需要一个非常大的节点。

我看到一些与我的豆荚类似的问题。 删除它们,并允许它们被重新创build有时帮助。 不一致。 我确定有足够的资源可用。

看到Kubernetes豆荚在“豆荚沙箱改变,它将被杀死和重新创build”