哪里可以find更明确的错误给出容器错误状态代码?
我实际上是通过一个使用Docker
容器的Mesos
堆栈来运行任务。
有时候,一些任务失败了。
以下是一些相关的TaskStatus
消息和原因:
message: Container exited with status 1 - reason: REASON_COMMAND_EXECUTOR_FAILED message: Container exited with status 42 - reason: REASON_COMMAND_EXECUTOR_FAILED message: Container exited with status 137 - reason: REASON_COMMAND_EXECUTOR_FAILED
是否有一个对应表,将TaskStatus
消息中的容器错误状态代码与更明确的错误链接起来?
命令任务可能由于以下原因而失败并设置正确的退出代码。 例如Docker 1.10设置这样的退出状态代码( 来自文档和这个答案 ):
docker运行的退出代码提供了有关容器为什么运行失败或为何退出的信息。 当docker run以非零代码退出时,退出代码遵循chroot标准,如下所示:
如果错误是Docker守护进程本身 :
$ docker run --foo busybox; echo $? # flag provided but not defined: --foo See 'docker run --help'.
126如果包含的命令不能被调用:
$ docker run busybox /etc; echo $? # docker: Error response from daemon: Container command '/etc' could not be invoked.
127如果包含的命令不能被find
$ docker run busybox foo; echo $? # docker: Error response from daemon: Container command 'foo' not found or does not exist. 127 Exit code of contained command
除此以外
$ docker run busybox /bin/sh -c 'exit 3'; echo $? # 3
另一个退出代码规则可以在这里find
| Code | Meaning | Example | Comments | |-------|--------------------------------|-------------------------|--------------------------------------------------------------------------------------------------------------| | 1 | Catchall for general errors | let "var1 = 1/0" | Miscellaneous errors, such as "divide by zero" and other impermissible operations | | 2 | Misuse of shell builtins | empty_function() {} | Missing keyword or command, or permission problem (and diff return code on a failed binary file comparison). | | 126 | Command invoked cannot execute | /dev/null | Permission problem or command is not an executable | | 127 | "command not found" | illegal_command | Possible problem with $PATH or a typo | | 128 | Invalid argument to exit | exit 3.14159 | exit takes only integer args in the range 0 - 255 (see first footnote) | | 128+n | Fatal error signal "n" | kill -9 $PPID of script | $? returns 137 (128 + 9) | | 130 | Script terminated by Control-C | Ctl-C | Control-C is fatal error signal 2, (130 = 128 + 2, see above) | | 255* | Exit status out of range | exit -1 | exit takes only integer args in the range 0 - 255 |
根据你的例子:
- 137 – 内存不足 ;
128 + 9 = 137 (9 coming from SIGKILL)
,可以转码为内存不足错误并杀死。 - 1 – 用
1
退出命令。 可能是由于configuration无效,内部应用程序错误或input无效。 - 42 –
回答生命,宇宙和万物的终极问题
如果你需要更多的信息来解释状态代码,你可以在Mesos TaskStatus更新中查看消息字段,例如Mesos把有关OOM的信息。 在Mesos日志中也可以find相同的信息。 要debugging为什么命令返回非零代码,您可以检查存储在执行器沙箱中的文件,特别是stderr / stdout或特定于命令的日志。
猜你想在mesos.proto
复制枚举原因 (复制下面):
enum Reason { // TODO(jieyu): The default value when a caller doesn't check for // presence is 0 and so ideally the 0 reason is not a valid one. // Since this is not used anywhere, consider removing this reason. REASON_COMMAND_EXECUTOR_FAILED = 0; REASON_CONTAINER_LAUNCH_FAILED = 21; REASON_CONTAINER_LIMITATION = 19; REASON_CONTAINER_LIMITATION_DISK = 20; REASON_CONTAINER_LIMITATION_MEMORY = 8; REASON_CONTAINER_PREEMPTED = 17; REASON_CONTAINER_UPDATE_FAILED = 22; REASON_EXECUTOR_REGISTRATION_TIMEOUT = 23; REASON_EXECUTOR_REREGISTRATION_TIMEOUT = 24; REASON_EXECUTOR_TERMINATED = 1; REASON_EXECUTOR_UNREGISTERED = 2; REASON_FRAMEWORK_REMOVED = 3; REASON_GC_ERROR = 4; REASON_INVALID_FRAMEWORKID = 5; REASON_INVALID_OFFERS = 6; REASON_IO_SWITCHBOARD_EXITED = 27; REASON_MASTER_DISCONNECTED = 7; REASON_RECONCILIATION = 9; REASON_RESOURCES_UNKNOWN = 18; REASON_SLAVE_DISCONNECTED = 10; REASON_SLAVE_REMOVED = 11; REASON_SLAVE_RESTARTED = 12; REASON_SLAVE_UNKNOWN = 13; REASON_TASK_CHECK_STATUS_UPDATED = 28; REASON_TASK_GROUP_INVALID = 25; REASON_TASK_GROUP_UNAUTHORIZED = 26; REASON_TASK_INVALID = 14; REASON_TASK_UNAUTHORIZED = 15; REASON_TASK_UNKNOWN = 16; }