unbuntu群集设置后无法Ping一个Pod

我遵循最近的指令(更新于15年5月7日)在ubuntu **中用etcd和flanneld设置集群。 但是我在networking上遇到了麻烦,似乎处于某种破裂的状态。

**注:我更新了configuration脚本,以便它安装0.16.2。 另外一个kubectl get minions没有返回任何开始,但一个sudo service kube-controller-manager restart他们出现了。

这是我的设置:

 | ServerName | Public IP | Private IP | ------------------------------------------ | KubeMaster | 107.xx32 | 10.xx54 | | KubeNode1 | 104.xx49 | 10.xx55 | | KubeNode2 | 198.xx39 | 10.xx241 | | KubeNode3 | 104.xx52 | 10.xx190 | | MongoDev1 | 162.xx132 | 10.xx59 | | MongoDev2 | 104.xx103 | 10.xx60 | 

从任何机器上,我可以ping任何其他机器…这是当我创build豆荚和服务,我开始得到问题。

 POD IP CONTAINER(S) IMAGE(S) HOST LABELS STATUS CREATED auth-dev-ctl-6xah8 172.16.37.7 sis-auth leportlabs/sisauth:latestdev 104.xx52/104.xx52 environment=dev,name=sis-auth Running 3 hours 

所以这个吊舱已经在KubeNode3KubeNode3 …如果我尝试从除KubeNode3以外的任何其他机器上ping它,我得到一个Destination Net Unreachable错误。 例如

 # ping 172.16.37.7 PING 172.16.37.7 (172.16.37.7) 56(84) bytes of data. From 129.250.204.117 icmp_seq=1 Destination Net Unreachable 

我可以调用etcdctl get /coreos.com/network/config ,然后取回{"Network":"172.16.0.0/16"}

我不知道从哪里看。 有人可以帮我从这里出去吗?

背景资料

在主节点上:

 # ps -ef | grep kube root 4729 1 0 May07 ? 00:06:29 /opt/bin/kube-scheduler --logtostderr=true --master=127.0.0.1:8080 root 4730 1 1 May07 ? 00:21:24 /opt/bin/kube-apiserver --address=0.0.0.0 --port=8080 --etcd_servers=http://127.0.0.1:4001 --logtostderr=true --portal_net=192.168.3.0/24 root 5724 1 0 May07 ? 00:10:25 /opt/bin/kube-controller-manager --master=127.0.0.1:8080 --machines=104.xx49,198.xx39,104.xx52 --logtostderr=true # ps -ef | grep etcd root 4723 1 2 May07 ? 00:32:46 /opt/bin/etcd -name infra0 -initial-advertise-peer-urls http://107.xx32:2380 -listen-peer-urls http://107.xx32:2380 -initial-cluster-token etcd-cluster-1 -initial-cluster infra0=http://107.xx32:2380,infra1=http://104.xx49:2380,infra2=http://198.xx39:2380,infra3=http://104.xx52:2380 -initial-cluster-state new 

在一个节点上:

 # ps -ef | grep kube root 10878 1 1 May07 ? 00:16:22 /opt/bin/kubelet --address=0.0.0.0 --port=10250 --hostname_override=104.xx49 --api_servers=http://107.xx32:8080 --logtostderr=true --cluster_dns=192.168.3.10 --cluster_domain=kubernetes.local root 10882 1 0 May07 ? 00:05:23 /opt/bin/kube-proxy --master=http://107.xx32:8080 --logtostderr=true # ps -ef | grep etcd root 10873 1 1 May07 ? 00:14:09 /opt/bin/etcd -name infra1 -initial-advertise-peer-urls http://104.xx49:2380 -listen-peer-urls http://104.xx49:2380 -initial-cluster-token etcd-cluster-1 -initial-cluster infra0=http://107.xx32:2380,infra1=http://104.xx49:2380,infra2=http://198.xx39:2380,infra3=http://104.xx52:2380 -initial-cluster-state new #ps -ef | grep flanneld root 19560 1 0 May07 ? 00:00:01 /opt/bin/flanneld 

所以我注意到flannel的configuration( /run/flannel/subnet.env )与/run/flannel/subnet.env启动的不同(不知道他们是如何失去同步的)。

 # ps -ef | grep docker root 19663 1 0 May07 ? 00:09:20 /usr/bin/docker -d -H tcp://127.0.0.1:4243 -H unix:///var/run/docker.sock --bip=172.16.85.1/24 --mtu=1472 # cat /run/flannel/subnet.env FLANNEL_SUBNET=172.16.60.1/24 FLANNEL_MTU=1472 FLANNEL_IPMASQ=false 

请注意, --bip=172.16.85.1/24与flannel子网FLANNEL_SUBNET=172.16.60.1/24是不同的。

所以自然我改变了/etc/default/docker来反映新的值。

 DOCKER_OPTS="-H tcp://127.0.0.1:4243 -H unix:///var/run/docker.sock --bip=172.16.60.1/24 --mtu=1472" 

但现在一个sudo service docker restart没有错误…所以看着/var/log/upstart/docker.log我可以看到以下

 FATA[0000] Shutting down daemon due to errors: Bridge ip (172.16.85.1) does not match existing bridge configuration 172.16.60.1 

所以最后一块拼图是删除旧桥,重新启动docker…

 # sudo brctl delbr docker0 # sudo service docker start 

如果sudo brctl delbr docker0返回bridge docker0 is still up; can't delete it bridge docker0 is still up; can't delete it运行ifconfig docker0 down然后再试一次。

请试试这个:

 ip link del docker0 systemctl restart flanneld