join位于不同Docker容器中的serf节点时出现问题

上下文:主机是AWS-EC2 / Ubuntu 14.04.5,Docker版本为17.05.0-ce。 容器是从公开可用的回购图像cbhihe/serf-alpine-bash 所有容器都位于同一个EC2实例中,并与net-interface“docker0”共享相同的默认网桥networking。

尝试通过从主机的shell传递cmd来join节点serfDC1(id d4fd90692e18)和serfDC2(id 6353e7f6134d):

 $ docker exec serfDC1 serf agent -node=Node1 -bind=0.0.0.0:7946 ==> Starting Serf agent… ==> Starting Serf agent RPC... ==> Serf agent running! Node name: 'd4fd90692e18' Bind addr: '0.0.0.0:7946' RPC addr: '127.0.0.1:7373' Encrypted: false Snapshot: false Profile: lan ==> Log data will now stream in as it occurs: 2017/06/04 00:01:10 [INFO] agent: Serf agent starting 2017/06/04 00:01:10 [INFO] serf: EventMemberJoin: d4fd90692e18 127.0.0.1 2017/06/04 00:01:11 [INFO] agent: Received event: member-join ^C 

发现Node1的容器的IP = 172.17.0.4后,我可以发出serf agent -join cmd到Node2:

 $ docker exec serfDC2 serf agent -node=Node2 -join=172.17.0.4 ==> Starting Serf agent... ==> Starting Serf agent RPC... ==> Serf agent running! Node name: '6353e7f6134d' Bind addr: '0.0.0.0:7946' RPC addr: '127.0.0.1:7373' Encrypted: false Snapshot: false Profile: lan ==> Joining cluster...(replay: false) Join completed. Synced with 1 initial agents ==> Log data will now stream in as it occurs: 2017/06/04 00:18:35 [INFO] agent: Serf agent starting 2017/06/04 00:18:35 [INFO] serf: EventMemberJoin: 6353e7f6134d 127.0.0.1 2017/06/04 00:18:35 [INFO] agent: joining: [172.17.0.4] replay: false 2017/06/04 00:18:35 [INFO] serf: EventMemberJoin: d4fd90692e18 127.0.0.1 2017/06/04 00:18:35 [INFO] agent: joined: 1 nodes 2017/06/04 00:18:36 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946 2017/06/04 00:18:36 [INFO] agent: Received event: member-join 2017/06/04 00:18:37 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34876 2017/06/04 00:18:37 [ERR] memberlist: Failed TCP fallback ping: EOF 2017/06/04 00:18:37 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received 2017/06/04 00:18:38 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946 2017/06/04 00:18:39 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34879 2017/06/04 00:18:39 [ERR] memberlist: Failed TCP fallback ping: EOF 2017/06/04 00:18:40 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received 2017/06/04 00:18:41 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946 2017/06/04 00:18:42 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34881 2017/06/04 00:18:42 [ERR] memberlist: Failed TCP fallback ping: EOF 2017/06/04 00:18:42 [INFO] memberlist: Marking d4fd90692e18 as failed, suspect timeout reached (0 peer confirmations) 2017/06/04 00:18:42 [INFO] serf: EventMemberFailed: d4fd90692e18 127.0.0.1 2017/06/04 00:18:43 [INFO] agent: Received event: member-failed 2017/06/04 00:18:44 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received 2017/06/04 00:19:05 [INFO] serf: attempting reconnect to d4fd90692e18 127.0.0.1:7946 ^C 

导致未能join如下所示:

 $ docker exec serfDC2 serf members 6353e7f6134d 127.0.0.1:7946 alive d4fd90692e18 127.0.0.1:7946 failed $ docker exec serfDC1 serf members d4fd90692e18 127.0.0.1:7946 alive 6353e7f6134d 127.0.0.1:7946 failed 

我现在已经有一段时间了,而且我的智慧已经到了我应该转向的地步了。 Hashicorp和Docker的文档似乎并没有涵盖两个不同容器中的农奴代理之间最初握手的这个方面。

有人能告诉我我错了吗? 任何答案都会很好,真的。 TX。

Serf节点需要使用可路由的地址来“宣布”自己。 在你的情况下,他们告诉对方:'嗨,我是localhost:…',所以每个人都试图回答本地主机,这是错误的,因为每个容器都有自己的本地主机。

有一个选项可以configuration代理使用eth0 ip来通告networking中的其他节点: -iface 。 那么你需要放弃-bind选项。 这些端口是默认的,所以不需要自定义。

所以,对于node1:

 serf agent -node=Node1 -iface=eth0 

而对于node2:

 serf agent -node=Node2 -join=172.17.0.2 -iface=eth0 

从文档 :

-iface – 这个标志可以用来提供一个绑定接口。 如果接口是已知的而不是地址,则可以使用它来代替-bind。

它适合我:

节点1:

 ==> Log data will now stream in as it occurs: 2017/06/04 01:56:40 [INFO] agent: Serf agent starting 2017/06/04 01:56:40 [INFO] serf: EventMemberJoin: Node1 172.17.0.2 2017/06/04 01:56:41 [INFO] agent: Received event: member-join 2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node2 172.17.0.3 2017/06/04 01:57:03 [INFO] agent: Received event: member-join 

节点2:

 ==> Log data will now stream in as it occurs: 2017/06/04 01:57:02 [INFO] agent: Serf agent starting 2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node2 172.17.0.3 2017/06/04 01:57:02 [INFO] agent: joining: [172.17.0.2] replay: false 2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node1 172.17.0.2 2017/06/04 01:57:02 [INFO] agent: joined: 1 nodes 2017/06/04 01:57:03 [INFO] agent: Received event: member-join 

编辑:

在每个容器都在自己的虚拟机(EC2实例)中的情况下,由于每个实例都有自己的dockernetworking并且没有互连,所以您必须提供EC2实例IP并公开相应的端口。 使用-advertise

-advertise – advertise标志用于更改我们通告给集群中其他节点的地址。

节点1:

 serf agent -node=Node1 -iface=eth0 -advertise=INSTANCE_IP 

节点2:

 serf agent -node=Node2 -join=NODE1_INSTANCE_IP -iface=eth0 

请记住在docker run 暴露农奴端口

 docker run -p 7946:7946 (...rest of the command...)