Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make state spillable in partitioned writer [databricks] #8667

Merged
merged 38 commits into from
Jul 26, 2023

Conversation

abellina
Copy link
Collaborator

@abellina abellina commented Jul 6, 2023

Closes #6980

This change ensures that the partitioned writer queued batches are made spillable prior to releasing the semaphore.

I did change the ParquetWriterSuite such that exceptions being checked during writer failures look at the cause and find a SparkUpgradeException, instead of a SparkException to reflect the current state of some checks done in parquet specifically for date/time rebasing. I didn't include fixes for this, just made the tests test for the exception actually thrown.

I verified that this change allows the store_sales nds_transcode at 50GB to work with only 3GB of GPU memory, whereas I needed 12GB of memory before the change (failing with OOM at various places with less than 12GB due to all of the state kept around without the semaphore held). I used 16 shuffle partitions to make the write tasks process around 500MiB each. I used the following command and template to test, changing the JAR location (and the allocSize to test when the baseline would stop OOMing):

./spark-submit-template test.template nds_transcode.py file:///data/nds_50_csv file:///data/nds_50_parquet report.out --output_mode overwrite --tables store_sales
export JARS=rapids-4-spark_2.12-23.08.0-SNAPSHOT-cuda11.jar
export SPARK_CONF=("--master" "spark://127.0.0.1:7077"
                   "--conf" "spark.plugins=com.nvidia.spark.SQLPlugin"
                   "--conf" "spark.executor.extraClassPath=$JARS"
                   "--conf" "spark.driver.extraClassPath=$JARS"
                   "--conf" "spark.eventLog.enabled=true"
                   "--conf" "spark.sql.shuffle.partitions=16"
                   "--conf" "spark.rapids.memory.gpu.allocSize=3GB"
                   "--conf" "spark.rapids.sql.csv.read.decimal.enabled=true")

@abellina
Copy link
Collaborator Author

abellina commented Jul 6, 2023

build

@abellina
Copy link
Collaborator Author

abellina commented Jul 6, 2023

build

@abellina
Copy link
Collaborator Author

abellina commented Jul 6, 2023

build

@abellina
Copy link
Collaborator Author

abellina commented Jul 7, 2023

build

@abellina
Copy link
Collaborator Author

abellina commented Jul 7, 2023

build

revans2
revans2 previously approved these changes Jul 7, 2023
@abellina
Copy link
Collaborator Author

Another follow on issue: #8736

@abellina
Copy link
Collaborator Author

abellina commented Jul 17, 2023

@jlowe @revans2 this should be ready for another look. I added unit tests that cover the interaction with batches to write, spillables, and splits, but that do not validate the outputs. I wanted to get this reviewed and follow up with more testing either via integration tests or unit tests as follow on work. It would be nice to get runtime for this patch in general.

#8738 for follow on test work.

@abellina
Copy link
Collaborator Author

build

@abellina
Copy link
Collaborator Author

I am working on the failure. It looks like a failure to shutdown RMM causing failures in other tests. It seems related to mocking exceptions in the tests, so that looks to be a real issue. Will post once I have it

@abellina
Copy link
Collaborator Author

build

@abellina
Copy link
Collaborator Author

build

@abellina
Copy link
Collaborator Author

build

@abellina
Copy link
Collaborator Author

abellina commented Jul 25, 2023

I am having lots of issues with DeviceMemoryEventHandlerSuite and this build will fail. I cannot explain the issue yet, as the suite initializes all mocks and uses them. The test passes by itself, but not in combination of others, which is taking a very long time to go through. I'll push again.

@abellina
Copy link
Collaborator Author

Ok the reason why RmmSparkRetrySuiteBase was flaky was because it was failing if we had leftover RMM callbacks configured (singletons). It was also a bad test because it was succeeding for the wrong reasons after changes happened to the impl.

@abellina
Copy link
Collaborator Author

build

@abellina
Copy link
Collaborator Author

@revans2 this is ready for another look

revans2
revans2 previously approved these changes Jul 26, 2023
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits

Comment on lines +77 to +83
protected def getOutputStream: FSDataOutputStream = {
val hadoopPath = new Path(path)
val fs = hadoopPath.getFileSystem(conf)
fs.create(hadoopPath, false)
}

protected val outputStream: FSDataOutputStream = getOutputStream
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could you add that as a comment to getOutputStream so we know why it is there?

spillableBatch: SpillableColumnarBatch,
statsTrackers: Seq[ColumnarWriteTaskStatsTracker]): Long = {
val writeStartTime = System.nanoTime
val cb = closeOnExcept(spillableBatch) { _ =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it would be a little cleaner to move cb to be inside of the closeOnExcept and not recreate it.

closeOnExcept(spillableBatch) { _ =>
  val cb = withRetryNoSplit[ColumnarBatch] {
    spillableBatch.getColumnarBatch()
  }
  withResource(cb) { _ =>
    throwIfRebase...
  }
}

@abellina
Copy link
Collaborator Author

@revans2 thanks for the review, pushed a08e5a5 to handle both issues.

@abellina
Copy link
Collaborator Author

build

@abellina abellina merged commit 5421a85 into NVIDIA:branch-23.08 Jul 26, 2023
27 checks passed
@abellina abellina deleted the spillable_splits_part_writer branch July 26, 2023 23:23
@sameerz sameerz added the reliability Features to improve reliability or bugs that severly impact the reliability of the plugin label Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Partitioned writes release GPU semaphore with unspillable GPU memory
4 participants