Skip to content

Commit

Permalink
[SPARK-44762][CONNECT][CORE] Doc for SparkConnect.addJobTag and Conne…
Browse files Browse the repository at this point in the history
…ct SparkSession.addTag

### What changes were proposed in this pull request?

Add more documentation about using tags, similar to how SparkConnect.setJobGroup is documented.

### Why are the changes needed?

Better doc.

### Does this PR introduce _any_ user-facing change?

Yes, better doc.

### How was this patch tested?

Doc only.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43182 from juliuszsompolski/SPARK-44762.

Authored-by: Juliusz Sompolski <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
  • Loading branch information
juliuszsompolski authored and HyukjinKwon committed Sep 30, 2023
1 parent 43210fe commit 58c24a5
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -675,6 +675,22 @@ class SparkSession private[sql] (
/**
* Add a tag to be assigned to all the operations started by this thread in this session.
*
* Often, a unit of execution in an application consists of multiple Spark executions.
* Application programmers can use this method to group all those jobs together and give a group
* tag. The application can use `org.apache.spark.sql.SparkSession.interruptTag` to cancel all
* running running executions with this tag. For example:
* {{{
* // In the main thread:
* spark.addTag("myjobs")
* spark.range(10).map(i => { Thread.sleep(10); i }).collect()
*
* // In a separate thread:
* spark.interruptTag("myjobs")
* }}}
*
* There may be multiple tags present at the same time, so different parts of application may
* use different tags to perform cancellation at different levels of granularity.
*
* @param tag
* The tag to be added. Cannot contain ',' (comma) character or be an empty string.
*
Expand Down
16 changes: 16 additions & 0 deletions core/src/main/scala/org/apache/spark/SparkContext.scala
Original file line number Diff line number Diff line change
Expand Up @@ -874,6 +874,22 @@ class SparkContext(config: SparkConf) extends Logging {
/**
* Add a tag to be assigned to all the jobs started by this thread.
*
* Often, a unit of execution in an application consists of multiple Spark actions or jobs.
* Application programmers can use this method to group all those jobs together and give a
* group tag. The application can use `org.apache.spark.sql.SparkSession.interruptTag` to cancel
* all running executions with this tag. For example:
* {{{
* // In the main thread:
* sc.addJobTag("myjobs")
* sc.parallelize(1 to 10000, 2).map { i => Thread.sleep(10); i }.count()
*
* // In a separate thread:
* spark.cancelJobsWithTag("myjobs")
* }}}
*
* There may be multiple tags present at the same time, so different parts of application may use
* different tags to perform cancellation at different levels of granularity.
*
* @param tag The tag to be added. Cannot contain ',' (comma) character.
*
* @since 3.5.0
Expand Down
8 changes: 8 additions & 0 deletions python/pyspark/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -2193,6 +2193,14 @@ def addJobTag(self, tag: str) -> None:
"""
Add a tag to be assigned to all the jobs started by this thread.
Often, a unit of execution in an application consists of multiple Spark actions or jobs.
Application programmers can use this method to group all those jobs together and give a
group tag. The application can use :meth:`SparkContext.cancelJobsWithTag` to cancel all
running executions with this tag.
There may be multiple tags present at the same time, so different parts of application may
use different tags to perform cancellation at different levels of granularity.
.. versionadded:: 3.5.0
Parameters
Expand Down
10 changes: 9 additions & 1 deletion python/pyspark/sql/session.py
Original file line number Diff line number Diff line change
Expand Up @@ -2010,11 +2010,19 @@ def addTag(self, tag: str) -> None:
"""
Add a tag to be assigned to all the operations started by this thread in this session.
Often, a unit of execution in an application consists of multiple Spark executions.
Application programmers can use this method to group all those jobs together and give a
group tag. The application can use :meth:`SparkSession.interruptTag` to cancel all running
executions with this tag.
There may be multiple tags present at the same time, so different parts of application may
use different tags to perform cancellation at different levels of granularity.
.. versionadded:: 3.5.0
Parameters
----------
tag : list of str
tag : str
The tag to be added. Cannot contain ',' (comma) character or be an empty string.
"""
raise RuntimeError(
Expand Down

0 comments on commit 58c24a5

Please sign in to comment.