pthread_create失败：MongoDB上的资源暂时不可用

目前，我在16GB RAM Ubuntu 16.04.1 x64的物理机上使用Docker运行独立模式的Spark Cluster

Spark Cluster容器的RAMconfiguration：master 4g，slave1 2g，slave2 2g，slave3 2g

docker run -itd --net spark -m 4g -p 8080:8080 --name master --hostname master MyAccount/spark &> /dev/null docker run -itd --net spark -m 2g -p 8080:8080 --name slave1 --hostname slave1 MyAccount/spark &> /dev/null docker run -itd --net spark -m 2g -p 8080:8080 --name slave2 --hostname slave2 MyAccount/spark &> /dev/null docker run -itd --net spark -m 2g -p 8080:8080 --name slave3 --hostname slave3 MyAccount/spark &> /dev/null docker exec -it master sh -c 'service ssh start' > /dev/null docker exec -it slave1 sh -c 'service ssh start' > /dev/null docker exec -it slave2 sh -c 'service ssh start' > /dev/null docker exec -it slave3 sh -c 'service ssh start' > /dev/null docker exec -it master sh -c '/usr/local/spark/sbin/start-all.sh' > /dev/null

我的MongoDB数据库中有大约170GB的数据。我使用./mongod运行MongoDB，没有在本地主机上使用./mongod进行复制和分片。

使用Stratio / Spark-Mongodb连接器

以下命令我运行在“主”容器上：

 /usr/local/spark/bin/spark-submit --master spark://master:7077 --executor-memory 2g --executor-cores 1 --packages com.stratio.datasource:spark-mongodb_2.11:0.12.0 code.py

code.py:

 from pyspark import SparkContext from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() spark.sql("CREATE TEMPORARY VIEW tmp_tb USING com.stratio.datasource.mongodb OPTIONS (host 'MyPublicIP:27017', database 'firewall', collection 'log_data')") df = spark.sql("select * from tmp_tb") df.show()

我修改了/etc/security/limits.conf和/etc/security/limits.d/20-nproc.conf ulimit值

 * soft nofile unlimited * hard nofile 131072 * soft nproc unlimited * hard nproc unlimited * soft fsize unlimited * hard fsize unlimited * soft memlock unlimited * hard memlock unlimited * soft cpu unlimited * hard cpu unlimited * soft as unlimited * hard as unlimited root soft nofile unlimited root hard nofile 131072 root soft nproc unlimited root hard nproc unlimited root soft fsize unlimited root hard fsize unlimited root soft memlock unlimited root hard memlock unlimited root soft cpu unlimited root hard cpu unlimited root soft as unlimited root hard as unlimited

$ ulimit -a

 core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 63682 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 131072 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

另外，添加

 kernel.pid_max=200000 vm.max_map_count=600000

在/etc/sysctl.conf

然后，重新启动后再次运行spark程序。

我仍然有以下错误说pthread_create failed: Resource temporarily unavailable和com.mongodb.MongoException$Network: Exception opening the socket 。

错误快照：

pyspark错误

mongodb错误

物理内存不够吗？或configuration的哪一部分我做错了？

谢谢。

pthread_create失败：MongoDB上的资源暂时不可用

如何在不同的主机上从外部访问docker container（overlay）中运行的服务

在火花集群模式下运行齐柏林飞艇

Docker中的Spark独立群集在networking“桥”

缓冲区/caching耗尽Docker容器内的Spark独立

使用Spark处理通过kafka进行的数据stream并使用Python进行编程

Elasticsearch-Hadoop库无法连接到Docker容器

如何启用Spark mesos docker执行程序？

具有Apache Spark的Docker容器，处于独立群集模式

无法在Spark上运行Docker上的Cassandra

org.apache.spark.sql.SQLContext无法加载文件