spark-submit job.py不能使用sparkConf传递configuration参数

我使用bin / spark-submit脚本提交我的python作业。 我想通过SparkConf&SparkContext类在python代码中configuration参数。 我尝试在SparkConf对象中设置appName,master,spark.cores.max和spark.scheduler.mode,并将其作为参数(conf)传递给SparkContext。 事实certificate,这个工作并没有发送到我设置的独立集群(它只是在本地运行, 单击以查看localhost:4040 / environment /的截图 )的主服务器。 通过下面的打印语句,很明显看到我传递给SparkContext的SparkConf具有所有4个configuration(),但来自SparkContext对象的confvariables没有任何更新,尽pipe默认情况下是conf。

对于其他方法,我尝试使用conf/spark-defaults.conf--properties-file my-spark.conf --master spark://spark-master:7077 。 它的工作原理。我也尝试使用主参数分别设置主

sc = pyspark.SparkContext(master="spark://spark-master:7077", conf=conf)

而且它也起作用:

 SparkContextConf: [ ('spark.app.name', 'test.py'), ('spark.rdd.compress', 'True'), ('spark.driver.port', '41147'), ('spark.app.id', 'app-20170403234627-0001'), ('spark.master','spark://spark-master:7077'), ('spark.serializer.objectStreamReset', '100'), ('spark.executor.id', 'driver'), ('spark.submit.deployMode', 'client'), ('spark.files', 'file:/mnt/scratch/yidi/docker-volume/test.py'), ('spark.driver.host', '10.0.0.5') ] 

所以似乎只有conf参数不能正确地被SparkContext拥塞。


星火工作代码:

 import operator import pyspark def main(): '''Program entry point''' import socket print(socket.gethostbyname(socket.gethostname())) print('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') num_cores = 4 conf = pyspark.SparkConf() conf.setAppName("word-count").setMaster("spark://spark-master:7077")#setExecutorEnv("CLASSPATH", path) conf.set("spark.scheduler.mode", "FAIR") conf.set("spark.cores.max", num_cores) print("Conf: {}".format(conf.getAll())) sc = pyspark.SparkContext(conf=conf) print("SparkContextConf: {}".format(sc.getConf().getAll())) #Intialize a spark context # with pyspark.SparkContext(conf=conf) as sc: # #Get a RDD containing lines from this script file # lines = sc.textFile("/mnt/scratch/yidi/docker-volume/war_and_peace.txt") # #Split each line into words and assign a frequency of 1 to each word # words = lines.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)) # #count the frequency for words # counts = words.reduceByKey(operator.add) # #Sort the counts in descending order based on the word frequency # sorted_counts = counts.sortBy(lambda x: x[1], False) # #Get an iterator over the counts to print a word and its frequency # for word,count in sorted_counts.toLocalIterator(): # print("{} --> {}".format(word.encode('utf-8'), count)) # print "Spark conf: ", conf.getAll() if __name__ == "__main__": main() 

作业执行控制台输出:

 root@bdd1ba99bf4f:/usr/spark-2.1.0# bin/spark-submit /mnt/scratch/yidi/docker-volume/test.py 10.0.0.5 Conf: dict_items([('spark.scheduler.mode', 'FAIR'), ('spark.app.name', 'word-count'), ('spark.master', 'spark://spark-master:7077'), ('spark.cores.max', '4')]) 17/04/03 23:24:27 INFO spark.SparkContext: Running Spark version 2.1.0 17/04/03 23:24:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/04/03 23:24:27 INFO spark.SecurityManager: Changing view acls to: root 17/04/03 23:24:27 INFO spark.SecurityManager: Changing modify acls to: root 17/04/03 23:24:27 INFO spark.SecurityManager: Changing view acls groups to: 17/04/03 23:24:27 INFO spark.SecurityManager: Changing modify acls groups to: 17/04/03 23:24:27 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 17/04/03 23:24:27 INFO util.Utils: Successfully started service 'sparkDriver' on port 38193. 17/04/03 23:24:27 INFO spark.SparkEnv: Registering MapOutputTracker 17/04/03 23:24:27 INFO spark.SparkEnv: Registering BlockManagerMaster 17/04/03 23:24:27 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 17/04/03 23:24:27 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 17/04/03 23:24:27 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-7b545c86-9344-4f73-a3db-e891c562714d 17/04/03 23:24:27 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB 17/04/03 23:24:27 INFO spark.SparkEnv: Registering OutputCommitCoordinator 17/04/03 23:24:27 INFO util.log: Logging initialized @1090ms 17/04/03 23:24:27 INFO server.Server: jetty-9.2.z-SNAPSHOT 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@3950ddbb{/jobs,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@286430d5{/jobs/json,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@404eb034{/jobs/job,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@79e3feec{/jobs/job/json,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@4661596e{/stages,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@4f8a3bef{/stages/json,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@7a70fd3a{/stages/stage,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@1c826806{/stages/stage/json,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@50e4e8d1{/stages/pool,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@4e2fe461{/stages/pool/json,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@33cb59b3{/storage,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@3c06c594{/storage/json,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@4b5300a5{/storage/rdd,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@7a6ef942{/storage/rdd/json,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@1301217d{/environment,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@19217cec{/environment/json,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@4a241145{/executors,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@470d45aa{/executors/json,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@5d9d8eff{/executors/threadDump,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@4fc94fbc{/executors/threadDump/json,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@258dd139{/static,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@880f037{/,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@395b7dae{/api,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@3cea7196{/jobs/job/kill,null,AVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@77257b2b{/stages/stage/kill,null,AVAILABLE} 17/04/03 23:24:27 INFO server.ServerConnector: Started ServerConnector@788dbc45{HTTP/1.1}{0.0.0.0:4040} 17/04/03 23:24:27 INFO server.Server: Started @1154ms 17/04/03 23:24:27 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 17/04/03 23:24:27 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://localhost:4040 17/04/03 23:24:27 INFO spark.SparkContext: Added file file:/mnt/scratch/yidi/docker-volume/test.py at file:/mnt/scratch/yidi/docker-volume/test.py with timestamp 1491261867734 17/04/03 23:24:27 INFO util.Utils: Copying /mnt/scratch/yidi/docker-volume/test.py to /tmp/spark-5bb22b93-172d-40d6-8a37-6e3da5dc15a6/userFiles-ed0a6692-b96a-4b3d-be16-215a21cf836b/test.py 17/04/03 23:24:27 INFO executor.Executor: Starting executor ID driver on host localhost 17/04/03 23:24:27 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44223. 17/04/03 23:24:27 INFO netty.NettyBlockTransferService: Server created on 10.0.0.5:44223 17/04/03 23:24:27 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 17/04/03 23:24:27 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.0.5, 44223, None) 17/04/03 23:24:27 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.0.0.5:44223 with 366.3 MB RAM, BlockManagerId(driver, 10.0.0.5, 44223, None) 17/04/03 23:24:27 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.0.5, 44223, None) 17/04/03 23:24:27 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.0.5, 44223, None) 17/04/03 23:24:27 INFO handler.ContextHandler: Started osjsServletContextHandler@c06133a{/metrics/json,null,AVAILABLE} SparkContextConf: [('spark.app.name', 'test.py'), ('spark.rdd.compress', 'True'), ('spark.serializer.objectStreamReset', '100'), ('spark.master', 'local[*]'), ('spark.executor.id', 'driver'), ('spark.submit.deployMode', 'client'), ('spark.files', 'file:/mnt/scratch/yidi/docker-volume/test.py'), ('spark.driver.host', '10.0.0.5'), ('spark.app.id', 'local-1491261867756'), ('spark.driver.port', '38193')] 17/04/03 23:24:27 INFO spark.SparkContext: Invoking stop() from shutdown hook 17/04/03 23:24:27 INFO server.ServerConnector: Stopped ServerConnector@788dbc45{HTTP/1.1}{0.0.0.0:4040} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@77257b2b{/stages/stage/kill,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@3cea7196{/jobs/job/kill,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@395b7dae{/api,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@880f037{/,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@258dd139{/static,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@4fc94fbc{/executors/threadDump/json,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@5d9d8eff{/executors/threadDump,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@470d45aa{/executors/json,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@4a241145{/executors,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@19217cec{/environment/json,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@1301217d{/environment,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@7a6ef942{/storage/rdd/json,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@4b5300a5{/storage/rdd,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@3c06c594{/storage/json,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@33cb59b3{/storage,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@4e2fe461{/stages/pool/json,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@50e4e8d1{/stages/pool,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@1c826806{/stages/stage/json,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@7a70fd3a{/stages/stage,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@4f8a3bef{/stages/json,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@4661596e{/stages,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@79e3feec{/jobs/job/json,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@404eb034{/jobs/job,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@286430d5{/jobs/json,null,UNAVAILABLE} 17/04/03 23:24:27 INFO handler.ContextHandler: Stopped osjsServletContextHandler@3950ddbb{/jobs,null,UNAVAILABLE} 17/04/03 23:24:27 INFO ui.SparkUI: Stopped Spark web UI at http://localhost:4040 17/04/03 23:24:27 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/04/03 23:24:27 INFO memory.MemoryStore: MemoryStore cleared 17/04/03 23:24:27 INFO storage.BlockManager: BlockManager stopped 17/04/03 23:24:27 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 17/04/03 23:24:27 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/04/03 23:24:27 INFO spark.SparkContext: Successfully stopped SparkContext 17/04/03 23:24:27 INFO util.ShutdownHookManager: Shutdown hook called 17/04/03 23:24:27 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-5bb22b93-172d-40d6-8a37-6e3da5dc15a6/pyspark-50d75419-36b8-4dd2-a7c6-7d500c7c5bca 17/04/03 23:24:27 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-5bb22b93-172d-40d6-8a37-6e3da5dc15a6