1.参数选取
当我们的代码写完,打好jar,就可以通过bin/spark-submit 提交到集群,命令如下:
./bin/spark-submit \
--class <main-class>
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
一般情况下使用上面这几个参数就够用了
--class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)
--master: The master URL for the cluster (e.g. spark://23.195.26.187:7077)
--deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client) †
--conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).
application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.
application-arguments: Arguments passed to the main method of your main class, if any
对于不同的集群管理,对spark-submit的提交列举几个简单的例子
# Run application locally on 8 cores
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[8] \
/path/to/examples.jar \
100
# Run on a Spark standalone cluster in client deploy mode
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
# Run on a Spark standalone cluster in cluster deploy mode with supervise
# make sure that the driver is automatically restarted if it fails with non-zero exit code
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--deploy-mode cluster
--supervise
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
# Run on a YARN cluster export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \ # can also be `yarn-client` for client mode
--executor-memory 20G \
--num-executors 50 \
/path/to/examples.jar \
1000
# Run a Python application on a Spark standalone cluster
./bin/spark-submit \
--master spark://207.184.161.138:7077 \
examples/src/main/python/pi.py \
1000
2.具体提交步骤
代码实现一个简单的统计
public class SimpleSample {
public static void main(String[] args) {
String logFile = "/home/bigdata/spark-1.5.1/README.md";
SparkConf conf = new SparkConf().setAppName("Simple Application");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> logData = sc.textFile(logFile).cache();
long numAs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) {
return s.contains("a");
}
}).count();
long numBs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) {
return s.contains("b");
}
}).count();
System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);
}
}
打成jar
上传命令
./bin/spark-submit --class cs.spark.SimpleSample --master spark://spark1:7077 /home/jar/spark-test-0.0.1-SN
本文转自:https://my.oschina.net/u/2529303/blog/541685
分享到:
相关推荐
spark-2.0.1集群安装及编写例子提交任务,包括集群安装包及例子代码加上安装文档, spark-2.0.1集群安装及编写例子提交任务,包括集群安装包及例子代码加上安装文档
java提交spark任务到yarn平台的配置讲解共9页.pdf.zip
windows中使用yarn-cluster模式提交spark任务,百度找不着的啦,看我这里。另外spark的版本要使用正确哦 更简单的方式参考: https://blog.csdn.net/u013314600/article/details/96313579
Spark 性能相关参数配置详解
Spark 性能相关参数配置详解
详细讲解了创建spark应用的参数含义以及出现问题对应的调优策略
spark运行涉及的一些优化方案,可能对一些实际场景有作用,喜欢就下载把
hdp spark
web管理spark任务。scala代码编写.可视化。web管理spark任务。scala代码编写.可视化。
spark配置参数优化,spark配置参数优化,spark配置参数优化,spark配置参数优化
hue中对spark任务的支持,是靠oozie支撑的。文档中提供了如何在hue中提交spark作业的操作步骤。
Spark3.0 调优参数 思维导图,包含Spark, Spark Sql, Hadoop等参数
基于 Spark 任务流执行平台项目源码+使用说明.zip基于 Spark 任务流执行平台项目源码+使用说明.zip基于 Spark 任务流执行平台项目源码+使用说明.zip基于 Spark 任务流执行平台项目源码+使用说明.zip基于 Spark 任务...
java提交spark任务到yarn平台的配置讲解共9页.pdf.zip
Flink任务、Spark任务提交到集群,通常需要将可执行Jar上传到集群,手动执行任务提交指令,如果有配套的大数据平台则需要上传Jar,由调度系统进行任务提交。 对开发者来说,本地IDEA调试Flink、Spark任务不涉及对象...
spark提交jdbc到pgsql测试代码
Spark3.0 调优参数 思维导图,包含Spark, Spark Sql, Hadoop等参数
折腾了很久,终于开始学习Spark的源码了,第一篇我打算讲一下Spark作业的提交过程。这个是Spark的App运行图,它通过一个Driver来和集群通信,集群负责作业的分配。今天我要讲的是如何创建这个Driver Program的过程。...
spark-submit工具参数说明