Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batched cholesky implementation and tests #1029

Merged
merged 9 commits into from
Nov 8, 2023

Conversation

jjwilke
Copy link
Contributor

@jjwilke jjwilke commented Aug 17, 2023

For specific matrix sizes in which the (n x n) submatrix fits on a single GPU, allow a batched cholesky implementation that calls POTRF on all the submatrices and parallelizes across the outer indices.

@ipdemes ipdemes added the category:new-feature PR introduces a new feature and will be classified as such in release notes label Aug 17, 2023
/*static*/ void BatchedCholeskyTask::cpu_variant(TaskContext& context)
{
#ifdef LEGATE_USE_OPENMP
openblas_set_num_threads(1); // make sure this isn't overzealous
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this even if we don't call any openmp pragmas inside of the cpu variant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm, not sure. I copied this straight from potrf.cc, which was @magnatelee who usually has good reasons for including things : ) I would say leave for now until we get clarification? I don't think it's hurting anything, but I also don't understand why it would be necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep this comment open so we don't loose it and can ask @magnatelee when he is back

Copy link
Contributor

@magnatelee magnatelee Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if one program doesn't use a cpu task and an openmp task, both calling openblas, this isn't strictly necessary. but we would never know, so to be absolutely safe, any persistent states like the number of openmp threads for openblas should be set by each task to make sure they match the assumption of the task.

src/cunumeric/mapper.cc Show resolved Hide resolved
cunumeric/linalg/cholesky.py Show resolved Hide resolved
src/cunumeric/matrix/batched_cholesky_template.inl Outdated Show resolved Hide resolved
@jjwilke
Copy link
Contributor Author

jjwilke commented Aug 23, 2023

@ipdemes This is probably ready to go? The one failing test is an RNG test unrelated to cholesky.

Alternatively, we could wait until next week before merging for @manopapad and @magnatelee to give the green light on the implementation.

@marcinz marcinz changed the base branch from branch-23.09 to branch-23.11 September 26, 2023 00:35
Copy link

copy-pr-bot bot commented Nov 7, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@manopapad
Copy link
Contributor

/ok to test

@manopapad manopapad merged commit b66e2ec into nv-legate:branch-23.11 Nov 8, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:new-feature PR introduces a new feature and will be classified as such in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants