Kubernetes:runContainer:API错误(500):无法启动容器(docker无法卸载)

有时候在我们的GKE集群上发生500错误,导致pod的创build失败:

1m 1m 1 installer-u57ab1f7707b03 Pod Normal Scheduled {default-scheduler } Successfully assigned installer-u57ab1f7707b03 to gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l 1m 1m 1 installer-u57ab1f7707b03 Pod Warning FailedSync {kubelet gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l} Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container ff8573fbf0b90a25b5565b1feb36671f13367115dde74e581cf249be772d8e4e: [8] System error: read parent: connection reset by peer\n" 1m 1m 1 installer-u57ab1f7707b03 Pod Warning FailedSync {kubelet gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l} Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container fbd7151d4489ed3ac9b21ef9ee3268039374fe3aee1f5933dc27d003f5388e7d: [8] System error: read parent: connection reset by peer\n" 1m 1m 1 installer-u57ab1f7707b03 Pod Warning FailedSync {kubelet gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l} Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container c6b7969fd036fd187f8b5b815106887d718780b290b81e6dde12162d15c22728: [8] System error: read parent: connection reset by peer\n" 49s 49s 1 installer-u57ab1f7707b03 Pod Warning FailedSync {kubelet gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l} Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container 5b0d78ee31759a3472f15fe375ef4f2542dcc65518023a1bd06593fe7d28a448: [8] System error: read parent: connection reset by peer\n" 32s 32s 1 installer-u57ab1f7707b03 Pod Warning FailedSync {kubelet gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l} Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container 7ff5941a30ce432aa1b1382e4b20d272a08a7113f79f7f1ff2f8898a00ca8f06: [8] System error: read parent: connection reset by peer\n" 18s 18s 1 installer-u57ab1f7707b03 Pod Warning FailedSync {kubelet gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l} Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container a91ae7d6dc9dee5196e73457d817bc46f8009c26147cc81727920aebfa52cc38: [8] System error: read parent: connection reset by peer\n" 2s 2s 1 installer-u57ab1f7707b03 Pod Warning FailedSync {kubelet gke-oro-cloud-v1-1445426963-ffbcc283-node-bo1l} Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container ad8b7bbe72410232d7fe6197e057d15e9003e24f6d8aad15bc7068430cfea508: [8] System error: read parent: connection reset by peer\n" 

在docker.log中我发现:

 time="2016-08-10T12:37:24.458097892Z" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/ad8b7bbe72410232d7fe6197e057d15e9003e24f6d8aad15bc7068430cfea508/shm: invalid argument\nfailed to umount /var/lib/docker/containers/ad8b7bbe72410232d7fe6197e057d15e9003e24f6d8aad15bc7068430cfea508/mqueue: invalid argument" time="2016-08-10T12:37:24.458280187Z" level=error msg="Handler for POST /containers/ad8b7bbe72410232d7fe6197e057d15e9003e24f6d8aad15bc7068430cfea508/start returned error: Cannot start container ad8b7bbe72410232d7fe6197e057d15e9003e24f6d8aad15bc7068430cfea508: [8] System error: read parent: connection reset by peer" time="2016-08-10T12:37:24.458315257Z" level=error msg="HTTP Error" err="Cannot start container ad8b7bbe72410232d7fe6197e057d15e9003e24f6d8aad15bc7068430cfea508: [8] System error: read parent: connection reset by peer" statusCode=500 time="2016-08-10T12:37:40.151776337Z" level=warning msg="signal: killed" 

Kubernetes版本v1.2.5
Docker版本1.9.1

任何想法如何解决它?

这可能是由于Docker 1.9中的runc错误 ,容器读取它的configuration,但是在父文件写完之前closures了读取pipe道。

Docker 1.10包含一个固定的runc。 Kubernetes 1.3使用Docker 1.11.2,但在升级之前,您可以通过在容器的命令行中添加额外的字符来解决此问题。