Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] Support Ranker #10823

Merged
merged 5 commits into from
Sep 21, 2024
Merged

[jvm-packages] Support Ranker #10823

merged 5 commits into from
Sep 21, 2024

Conversation

wbo4958
Copy link
Contributor

@wbo4958 wbo4958 commented Sep 16, 2024

No description provided.

@wbo4958
Copy link
Contributor Author

wbo4958 commented Sep 16, 2024

Hi @trivialfis, @eordentlich Could you help review it? Thx

@eordentlich
Copy link

Would be good to explain how this resolves the issue raised in this comment and issue linked therein:
#10639 (comment)

Copy link

@eordentlich eordentlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions re plugin and preprocess.

*/
override private[spark] def preprocess(dataset: Dataset[_]): (Dataset[_], ColumnIndices) = {
val (output, columnIndices) = super.preprocess(dataset)
(output.sortWithinPartitions(getGroupCol), columnIndices)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this operation interact with spark-rapids plugin if enabled? Any implications on GPU memory?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this preprocess even get called if plugin is enabled? If not, partition might not be sorted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Fixed this issue. Please help review it again. Thx very much.

@wbo4958
Copy link
Contributor Author

wbo4958 commented Sep 19, 2024

Hi @eordentlich @trivialfis, Could you help take a look at it. Thx very much.

Copy link

@eordentlich eordentlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resolves the issue with the plugin and sorted partitions (and nice to see the test for this case too), but still wondering how that partition sort is computed by the spark-rapids plugin when enabled. Is done on the GPU?

Also, does this PR resolve the issue I reference in an earlier comment?

@trivialfis should take a look as well.

@wbo4958
Copy link
Contributor Author

wbo4958 commented Sep 19, 2024

HI @eordentlich,

I just tried the below case which has the same pattern with XGBoost

df.repartition(2).sortWithinPartitions("class").collect()

and got below Physical plans

== Physical Plan ==
AdaptiveSparkPlan (10)
+- == Final Plan ==
   GpuColumnarToRow (6)
   +- GpuSort (5)
      +- GpuShuffleCoalesce (4)
         +- ShuffleQueryStage (3), Statistics(sizeInBytes=6.4 KiB, rowCount=150)
            +- GpuColumnarExchange (2)
               +- GpuScan parquet  (1)
+- == Initial Plan ==
   Sort (9)
   +- Exchange (8)
      +- Scan parquet  (7)

XGBoost leverages ColumnarRdd to extract the CUDF table. ColumnarRdd is going to do some case match to filter out the RDDs that involves row-wised operations. So after converting,

we we get below corresponding RDDs which are coming from below GPU plans. So you can see the final cudf table was coming from GpuSort which will run on GPUs

   +- GpuSort (5)
      +- GpuShuffleCoalesce (4)
         +- ShuffleQueryStage (3), Statistics(sizeInBytes=6.4 KiB, rowCount=150)
            +- GpuColumnarExchange (2)
               +- GpuScan parquet  (1)

Copy link

@eordentlich eordentlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@trivialfis
Copy link
Member

Will look into this tomorrow.

@trivialfis
Copy link
Member

trivialfis commented Sep 20, 2024

Also, does this PR resolve the issue I reference in an earlier comment?

Thank you for raising that. I will be looking into it along with other LTR issues/feature requests after sorting out some of the work on external memory. I still think within-partition sort is sufficient for most of the use cases. The worst case is adding these qid-based partitioning, which might be as expensive as a global sort.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, assuming all tests can pass.

@trivialfis trivialfis merged commit 19b55b3 into dmlc:master Sep 21, 2024
30 checks passed
@wbo4958 wbo4958 deleted the ranker branch September 21, 2024 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants