Docker群集TLS无法validation待定节点

我在我的swarmpipe理容器上有这个日志:

time="2016-04-15T02:47:59Z" level=debug msg="Failed to validate pending node: lookup node1 on 10.0.2.3:53: server misbehaving" Addr="node1:2376" 

我已经build立了一个github repo来重现我的问题: https : //github.com/casertap/playing-with-swarm-tls我正在运行一个集群确定2机(用vagrantbuild造)

 $script2 = <<STOP service docker stop sed -i 's/DOCKER_OPTS=/DOCKER_OPTS="-H tcp:\\/\\/0.0.0.0:2376 -H unix:\\/\\/\\/var\\/run\\/docker.sock --tlsverify --tlscacert=\\/home\\/vagrant\\/.certs\\/ca.pem --tlscert=\\/home\\/vagrant\\/.certs\\/cert.pem --tlskey=\\/home\\/vagrant\\/.certs\\/key.pem"/' /etc/init/docker.conf service docker start STOP Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| config.vm.box = "ubuntu/trusty64" config.vm.define "node1" do |app| app.vm.network "private_network", ip: "192.168.33.10" app.vm.provision "file", source: "ca.pem", destination: "~/.certs/ca.pem" app.vm.provision "file", source: "node1-cert.pem", destination: "~/.certs/cert.pem" app.vm.provision "file", source: "node1-priv-key.pem", destination: "~/.certs/key.pem" app.vm.provision "file", source: "node1.csr", destination: "~/.certs/node1.csr" app.vm.provision "docker" app.vm.provision :shell, :inline => $script2 end config.vm.define "swarm" do |app| app.vm.network "private_network", ip: "192.168.33.12" app.vm.provision "shell", inline: "echo '192.168.33.10 node1' >> /etc/hosts" app.vm.provision "shell", inline: "echo '192.168.33.12 swarm' >> /etc/hosts" app.vm.provision "docker" app.vm.provision "file", source: "ca.pem", destination: "~/.certs/ca.pem" app.vm.provision "file", source: "swarm-cert.pem", destination: "~/.certs/cert.pem" app.vm.provision "file", source: "swarm-priv-key.pem", destination: "~/.certs/key.pem" app.vm.provision "file", source: "swarm.csr", destination: "~/.certs/swarm.csr" end end 

正如你可以看到我的node1 /etc/init/docker.conf有以下选项:

 DOCKER_OPTS="-H tcp:\\/\\/0.0.0.0:2376 -H unix:\\/\\/\\/var\\/run\\/docker.sock --tlsverify --tlscacert=\\/home\\/vagrant\\/.certs\\/ca.pem --tlscert=\\/home\\/vagrant\\/.certs\\/cert.pem --tlskey=\\/home\\/vagrant\\/.certs\\/key.pem" 

我做

stream浪了

然后我连接到群

 vagrant ssh swarm export TOKEN=$(docker run swarm create) #dd182b8d2bc8c03f417376296558ba29 docker run -d swarm join --advertise node1:2376 token://dd182b8d2bc8c03f417376296558ba29 

node1在/ etc / hosts文件中定义,您可以在stream浪者configuration文件中看到。

以logdebugging级别启动swarmpipe理器(wihthout -d)

 docker run -p 3376:3376 -v /home/vagrant/.certs:/certs:ro swarm -l debug manage --tlsverify --tlscacert=/certs/ca.pem --tlscert=/certs/cert.pem --tlskey=/certs/key.pem --host=0.0.0.0:3376 token://dd182b8d2bc8c03f417376296558ba29 

日志显示了我:

 time="2016-04-15T02:47:59Z" level=debug msg="Failed to validate pending node: lookup node1 on 10.0.2.3:53: server misbehaving" Addr="node1:2376" 

我在/ etc / hosts中的node1 ip地址实际上是:

 192.168.33.10 node1 

看起来docker正在尝试在错误的桥接networking上查找node1别名?

==========更多信息:

你可以检查这个URL,看看发现服务是否find了你的node1,它会:

 https://discovery.hub.docker.com/v1/clusters/dd182b8d2bc8c03f417376296558ba29 

现在如果你用-d和do运行swarmpipe理器:

 vagrant@vagrant-ubuntu-trusty-64:~$ docker --tlsverify --tlscacert=/home/vagrant/.certs/ca.pem --tlscert=/home/vagrant/.certs/cert.pem --tlskey=/home/vagrant/.certs/key.pem -H swarm:3376 info Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 0 Server Version: swarm/1.2.0 Role: primary Strategy: spread Filters: health, port, dependency, affinity, constraint Nodes: 1 (unknown): node1:2376 └ Status: Pending └ Containers: 0 └ Reserved CPUs: 0 / 0 └ Reserved Memory: 0 B / 0 B └ Labels: └ Error: (none) └ UpdatedAt: 2016-04-15T03:03:28Z └ ServerVersion: Plugins: Volume: Network: Kernel Version: 3.13.0-85-generic Operating System: linux Architecture: amd64 CPUs: 0 Total Memory: 0 B Name: ee85273cbb64 Docker Root Dir: Debug mode (client): false Debug mode (server): false WARNING: No kernel memory limit support 

你看到节点是:待定

尽pipe在机器的/ etc / hosts中定义了node1,但是swarmpipe理器运行的容器在其/ etc / hosts文件中没有node1。 默认情况下,容器不共享主机的文件系统。 请参阅https://docs.docker.com/engine/userguide/containers/dockervolumes/ 。 Swarmpipe理器尝试通过DNSparsing器查找node1并失败。

有几个选项来解决这个问题。

  1. 使用可parsing的FQDN,以便容器中的Swarmpipe理器可以parsing节点
  2. 或者在swarm join命令中提供node1的IP
  3. 或者使用-v选项将/ etc / hosts文件从主机传递到Swarmpipe理器容器。 请参阅上面的链接。