CrashLoopBackOff在kubernetes的spark集群中:nohup:不能执行' – ':没有这样的文件或目录

Dockerfile:

FROM openjdk:8-alpine RUN apk update && \ apk add curl bash procps ENV SPARK_VER 2.1.1 ENV HADOOP_VER 2.7 ENV SPARK_HOME /opt/spark # Get Spark from US Apache mirror. RUN mkdir -p /opt && \ cd /opt && \ curl http://www.us.apache.org/dist/spark/spark-${SPARK_VER}/spark-${SPARK_VER}-bin-hadoop${HADOOP_VER}.tgz | \ tar -zx && \ ln -s spark-${SPARK_VER}-bin-hadoop${HADOOP_VER} spark && \ echo Spark ${SPARK_VER} installed in /opt ADD start-common.sh start-worker.sh start-master.sh / RUN chmod +x /start-common.sh /start-master.sh /start-worker.sh ENV PATH $PATH:/opt/spark/bin WORKDIR $SPARK_HOME EXPOSE 4040 6066 7077 8080 CMD ["spark-shell", "--master", "local[2]"] 

火花主service.yaml:

 apiVersion: v1 kind: Service metadata: name: spark-master labels: name: spark-master spec: type: NodePort ports: # the port that this service should serve on - name: webui port: 8080 targetPort: 8080 - name: spark port: 7077 targetPort: 7077 - name: rest port: 6066 targetPort: 6066 selector: name: spark-master 

火花master.yaml:

 apiVersion: extensions/v1beta1 kind: Deployment metadata: labels: name: spark-master name: spark-master spec: replicas: 1 template: metadata: labels: name: spark-master spec: containers: - name : spark-master imagePullPolicy: "IfNotPresent" image: spark-2.1.1-bin-hadoop2.7 name: spark-master ports: - containerPort: 8080 - containerPort: 7077 - containerPort: 6066 command: ["/start-master.sh"] 

错误:back-off重新启动失败的docker容器错误同步pod,跳过:没有通过CrashLoopBackOff:“spark-master”的“StartContainer”:10秒重启失败容器= spark-master pod = spark-master-286530801-7qv4l_default (34fecb5e-55eb-11e7-994e-525400f3f8c2)”

任何想法? 谢谢

UPDATE

  2017-06-20T19:43:56.300935235Z starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-spark-master-1682838347-9927h.out 2017-06-20T19:44:03.414011228Z failed to launch: nice -n 0 /opt/spark/bin/spark-class org.apache.spark.deploy.master.Master --host spark-master-1682838347-9927h --port 7077 --webui-port 8080 --ip spark-master --port 7077 2017-06-20T19:44:03.418640516Z nohup: can't execute '--': No such file or directory 2017-06-20T19:44:03.419814788Z full log in /opt/spark/logs/spark--org.apache.spark.deploy.maste 2017-06-20T19:43:50.343251857Z starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark--org.apache.spark.deploy.worker.Worker-1-spark-worker-243125562-0lh9k.out 2017-06-20T19:43:57.450929613Z failed to launch: nice -n 0 /opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://spark-master:7077 2017-06-20T19:43:57.465409083Z nohup: can't execute '--': No such file or directory 2017-06-20T19:43:57.466372593Z full log in /opt/spark/logs/spark--org.apache.spark.deploy.worker.Worker-1-spark-worker-243125562-0lh9k.out r.Master-1-spark-master-1682838347-9927h.out 

与阿尔卑斯山运送的nohup版本不支持“ – ”。 你需要通过你的docker文件中的coreutils alpine软件包安装一个nohup的GNU版本,如下所示:

RUN apk --update add coreutils

或者,创build自己的启动脚本,直接运行该类,然后运行它

/usr/spark/bin/spark-submit --class org.apache.spark.deploy.master.Master $SPARK_MASTER_INSTANCE --port $SPARK_MASTER_PORT --webui-port $SPARK_WEBUI_PORT

这只是一个想法,我没有太多的看待。

我想象一下, start-master.sh可能正在寻找start-common.sh因为通常它们都在PATH中,但是在这个Dockerfile它们被添加到/ 。 也许你可以试试

 ENV PATH $PATH:/:/opt/spark/bin 

或者将这些脚本添加到/opt/spark/bin