当使用mesos,marathon和zookeeper时,我使用“docker,mesos”指定“containerizers”文件时,我的mesos-slave不会启动?

我有3个CentOS虚拟机,我在主节点上安装了Zookeeper,Marathon和Mesos,而只把Mesos放在其他2个虚拟机上。 主节点上没有运行mesos-slave。 我试图运行Docker容器,所以我在containerizes文件中指定了"docker,mesos" 。 其中一个mesos代理启动罚款与此configuration,我已经能够部署一个容器到该奴隶。 但是,第二个mesos代理简单地失败,当我有这个configuration(它工作,如果我拿出containerizes文件,但它不运行容器)。 以下是一些已经出现的日志和信息:

以下是日志目录中的一些“消息”:

 Apr 26 16:09:12 centos-minion-3 systemd: Started Mesos Slave. Apr 26 16:09:12 centos-minion-3 systemd: Starting Mesos Slave... WARNING: Logging before InitGoogleLogging() is written to STDERR [main.cpp:243] Build: 2017-04-12 16:39:09 by centos [main.cpp:244] Version: 1.2.0 [main.cpp:247] Git tag: 1.2.0 [main.cpp:251] Git SHA: de306b5786de3c221bae1457c6f2ccaeb38eef9f [logging.cpp:194] INFO level logging started! [systemd.cpp:238] systemd version `219` detected [main.cpp:342] Inializing systemd state [systemd.cpp:326] Started systemd slice `mesos_executors.slice` [containerizer.cpp:220] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni [linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher [provisioner.cpp:249] Using default backend 'copy' [slave.cpp:211] Mesos agent started on (1)@172.22.150.87:5051 [slave.cpp:212] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="linux" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos" [slave.cpp:541] Agent resources: cpus(*):1; mem(*):919; disk(*):2043; ports(*):[31000-32000] [slave.cpp:549] Agent attributes: [ ] [slave.cpp:554] Agent hostname: node3 [status_update_manager.cpp:177] Pausing sending status updates [state.cpp:62] Recovering state from '/var/lib/mesos/meta' [state.cpp:706] No committed checkpointed resources found at '/var/lib/mesos/meta/resources/resources.info' [status_update_manager.cpp:203] Recovering status update manager [docker.cpp:868] Recovering Docker containers [containerizer.cpp:599] Recovering containerizer [provisioner.cpp:410] Provisioner recovery complete [group.cpp:340] Group process (zookeeper-group(1)@172.22.150.87:5051) connected to ZooKeeper [group.cpp:830] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) [group.cpp:418] Trying to create path '/mesos' in ZooKeeper [detector.cpp:152] Detected a new leader: (id='15') [group.cpp:699] Trying to get '/mesos/json.info_0000000015' in ZooKeeper [zookeeper.cpp:259] A new leading master (UPID=master@172.22.150.88:5050) is detected Failed to perform recovery: Collect failed: Failed to run 'docker -H unix:///var/run/docker.sock ps -a': exited with status 1; stderr='Cannot connect to the Docker daemon. Is the docker daemon running on this host?' To remedy this do as follows: Step 1: rm -f /var/lib/mesos/meta/slaves/latest This ensures agent doesn't recover old live executors. Step 2: Restart the agent. Apr 26 16:09:13 centos-minion-3 systemd: mesos-slave.service: main process exited, code=exited, status=1/FAILURE Apr 26 16:09:13 centos-minion-3 systemd: Unit mesos-slave.service entered failed state. Apr 26 16:09:13 centos-minion-3 systemd: mesos-slave.service failed. 

来自docker的日志:

 $ sudo systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/docker.service.d └─flannel.conf Active: inactive (dead) since Tue 2017-04-25 18:00:03 CDT; 24h ago Docs: docs.docker.com Main PID: 872 (code=exited, status=0/SUCCESS) Apr 26 18:25:25 centos-minion-3 systemd[1]: Dependency failed for Docker Application Container Engine. Apr 26 18:25:25 centos-minion-3 systemd[1]: Job docker.service/start failed with result 'dependency' 

来自法兰绒的日志:

 [flanneld-start: network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured 

你有你的日志回答

 Failed to perform recovery: Collect failed: Failed to run 'docker -H unix:///var/run/docker.sock ps -a': exited with status 1; stderr='Cannot connect to the Docker daemon. Is the docker daemon running on this host?' To remedy this do as follows: Step 1: rm -f /var/lib/mesos/meta/slaves/latest This ensures agent doesn't recover old live executors. Step 2: Restart the agent. 

Mesos保持它在本地磁盘上的状态/元数据。 当它重新启动时,尝试加载这个状态。 如果configuration发生变化,并且与以前的状态不兼容,则不会启动。

只要让docker工人通过解决法兰绒等问题来生活,一切都会好起来的。