Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support spark submit jar on k8s #4867

Merged
merged 4 commits into from
Sep 8, 2023

Conversation

lenoxzhao
Copy link
Contributor

@lenoxzhao lenoxzhao commented Aug 16, 2023

What is the purpose of the change

Support submit spark jar job on k8s.

Environment

Spark:v3.2.1
Kubernetes: v1.23

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:58:47Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.17", GitCommit:"953be8927218ec8067e1af2641e540238ffd7576", GitTreeState:"clean", BuildDate:"2023-02-22T13:27:46Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}

Linkis:
image

Test Command

linkis-cli --mode once \
-code 'sparkjars' \
-engineType spark-3.2.1 \
-codeType jar \
-labelMap engineConnMode=once \
-sourceMap jobName=OnceJobTest \
-k8sCluster 'K8S-default' \
-jobContentMap spark.app.main.class='org.apache.spark.examples.SparkPi' \
-jobContentMap runType='jar' \
-confMap spark.master='k8s-native' \
-confMap linkis.spark.k8s.serviceAccount='spark' \
-confMap spark.submit.deployMode='cluster' \
-confMap linkis.spark.k8s.master.url='k8s://https://172.31.226.155:6443' \
-confMap linkis.spark.k8s.config.file='/home/hadoop/.kube/config' \
-confMap linkis.spark.k8s.imagePullPolicy='IfNotPresent' \
-confMap spark.app.name='spark-submit-jar-k8s-test-linkis' \
-confMap spark.app.resource='local:///opt/spark/examples/jars/spark-examples_2.12-3.2.1.jar' \
-confMap linkis.spark.k8s.namespace='default'

Brief change log

  • SparkConfig.java: add spark.kubernetes.driver.request.cores and spark.kubernetes.executor.request.cores related fields.

  • ClusterDescriptorAdapterFactory.java: add new k8s submission way("k8s-native", different from "k8s-operator").

  • KubernetesApplicationClusterDescriptorAdapter.java: main features are as below

    • SparkLauncher add k8s related params setting.
    • Manually specifying driver pod name for follow-up task monitoring.
    • initJobId() method converts the state of the driver pod to the SparkAppHandle.State to determine whether the task has started.
  • SparkOnKubernetesSubmitOnceExecutor.scala: new once executor type for k8s.

  • SparkEngineConnFactory.scala: add linkis.spark.k8s.driver.request.cores and linkis.spark.k8s.executor.request.cores to sparkConf.

  • SparkEngineConnResourceFactory.scala: According to Spark Doc , spark.kubernetes.executor.request.cores and spark.kubernetes.driver.request.cores have priority over spark.executor.cores and spark.driver.cores. So if spark.kubernetes.executor.request.cores is not specified when check resources, manually assign the value of spark.executor.cores to spark.kubernetes.executor.request.cores to make sure the default value of spark.kubernetes.executor.request.cores won't affect the tasks.
    ca39e93103aa05ce81391ef3f8566c6

  • SparkOnceExecutorFactory.scala: Create SparkOnKubernetesSubmitOnceExecutor if -k8sCluster is specified.

  • SparkJobProgressUtil.scala: main features are as below

    • Methods add podIP params to be compatible with the spark tasks on k8s.
    • getKubernetesSparkJobInfo(): method for getting status of spark tasks through Spark UI exposed on driver pod’s 4040 port.

Checklist

  • I have read the Contributing Guidelines on pull requests.
  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the Linkis mailing list first)
  • If this is a code change: I have written unit tests to fully verify the new behavior.

.setDeployMode(sparkConfig.getDeployMode())
.setAppName(sparkConfig.getAppName())
.setVerbose(true);
this.driverPodName = sparkConfig.getAppName() + "-" + System.currentTimeMillis() + "-driver";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using timestamp to make driverPodName unique is not enough good, UUID would be better.
If this is for pod create time recording, there is creationTimestamp in pod spec already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using timestamp to make driverPodName unique is not enough good, UUID would be better. If this is for pod create time recording, there is creationTimestamp in pod spec already.

Thanks for reviewing the codes and changes have been made in the last commitment.

if (StringUtils.isNotBlank(master)) {
if (master.equalsIgnoreCase("k8s-operator")) {
clusterDescriptorAdapter = new KubernetesOperatorClusterDescriptorAdapter(executionContext);
} else if (master.equalsIgnoreCase("k8s-jar")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, "k8s-native" would be more suitable than "k8s-jar", which is aligned to the concept "k8s-operator".

logger.info("clusterDescriptorAdapter.getJobState is null")
} else {
logger.info("clusterDescriptorAdapter/getJobState is not null")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These logs seem to be used for debugging. May be too noisy in prod env.

engineConn.getEngineConnSession match {
case context: SparkEngineConnContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to optimize it into an if else judgment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already done in the last commitment.

Copy link
Collaborator

@guoshupei guoshupei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@peacewong peacewong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@peacewong peacewong merged commit 5ca6726 into apache:master Sep 8, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants