无法使用docker容器中的pysparkling自动运行H2O Flow

语境:

我有一个正在运行的H2O Sparkling Water本地环境,使用docker集装箱安装。

我创build了一个基于官方jupyter/all-spark-notebook图像的jupyter/all-spark-notebook来安装Hadoop和Spark的本地环境,并在其上面包含以下代码:

 # Install H2O pysparkling requirements RUN pip install requests && \ pip install tabulate && \ pip install six && \ pip install future && \ pip install colorama # Expose H2O Flow UI ports EXPOSE 54321 EXPOSE 54322 EXPOSE 55555 # Install H2O sparkling water RUN \ cd /home/$NB_USER && \ wget http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.1/7/sparkling-water-2.1.7.zip && \ unzip sparkling-water-2.1.7.zip && \ cd sparkling-water-2.1.7 

为了从pysparkling运行H2O Flow,我做了以下工作:

 $ docker exec -it tlh2opyspark_notebook_1 /bin/bash # in the host # /home/jovyan/sparkling-water-2.1.7/bin/pysparkling # in the container >>> from pysparkling import * # pysparkling shell in the container >>> hc = H2OContext.getOrCreate(sc) # pysparkling shell in the container 

然后我可以在浏览器中打开H2O Flow在http://localhost:54321 。 只要我在集装箱内的terminal打开了pysparkling会话,它就能正常运行。

问题

我尝试了几种替代scheme,从容器内部自动运行H2O Flow(使用pysparkling),但似乎没有任何适当的工作。

我试图在Dockerfile中运行以下CMD,但H2O Flow几秒后总是崩溃:

 bash -c "echo 'from pyspark import SparkContext; sc = SparkContext(); from pysparkling import *; import h2o; hc = H2OContext.getOrCreate(sc)' | /home/jovyan/sparkling-water-2.1.7/bin/pysparkling" 

我也尝试了下面的代码,但它也几秒钟后崩溃:

 bash -c "/usr/local/spark/bin/spark-submit --py-files ../sparkling-water-2.1.7/py/build/dist/h2o_pysparkling_2.1-2.1.7-py2.7.egg --conf spark.dynamicAllocation.enabled=false ../work/start_h2o.py" 

其中start_h2o.py包含:

 # start_h2o.py from pyspark import SparkContext, SparkConf sc =SparkContext() from pysparkling import * hc = H2OContext.getOrCreate(sc) 

有没有一个适当和可靠的方法来设置一个Dockerfile,以便在容器启动时,H2O Flow(pysparkling)作为一个服务自动运行,就像Jupyter Notebook从Jupyter容器自动运行一样?