Add batched cholesky implementation and tests #1029

jjwilke · 2023-08-17T22:02:51Z

For specific matrix sizes in which the (n x n) submatrix fits on a single GPU, allow a batched cholesky implementation that calls POTRF on all the submatrices and parallelizes across the outer indices.

cunumeric/linalg/cholesky.py

src/cunumeric/matrix/batched_cholesky.cc

src/cunumeric/matrix/batched_cholesky_template.inl

ipdemes · 2023-08-18T04:07:22Z

src/cunumeric/matrix/batched_cholesky.cc

+/*static*/ void BatchedCholeskyTask::cpu_variant(TaskContext& context)
+{
+#ifdef LEGATE_USE_OPENMP
+  openblas_set_num_threads(1);  // make sure this isn't overzealous


Do we need this even if we don't call any openmp pragmas inside of the cpu variant?

umm, not sure. I copied this straight from potrf.cc, which was @magnatelee who usually has good reasons for including things : ) I would say leave for now until we get clarification? I don't think it's hurting anything, but I also don't understand why it would be necessary.

I will keep this comment open so we don't loose it and can ask @magnatelee when he is back

if one program doesn't use a cpu task and an openmp task, both calling openblas, this isn't strictly necessary. but we would never know, so to be absolutely safe, any persistent states like the number of openmp threads for openblas should be set by each task to make sure they match the assumption of the task.

src/cunumeric/mapper.cc

src/cunumeric/matrix/batched_cholesky_omp.cc

src/cunumeric/matrix/batched_cholesky_template.inl

tests/integration/test_cholesky.py

src/cunumeric/mapper.cc

cunumeric/linalg/cholesky.py

src/cunumeric/matrix/batched_cholesky_template.inl

jjwilke · 2023-08-23T21:42:05Z

@ipdemes This is probably ready to go? The one failing test is an RNG test unrelated to cholesky.

Alternatively, we could wait until next week before merging for @manopapad and @magnatelee to give the green light on the implementation.

copy-pr-bot · 2023-11-07T22:55:21Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

manopapad · 2023-11-07T23:58:15Z

/ok to test

jjwilke requested review from magnatelee, rohany, ipdemes, manopapad and aschaffer August 17, 2023 22:02

ipdemes added the category:new-feature PR introduces a new feature and will be classified as such in release notes label Aug 17, 2023

ipdemes reviewed Aug 17, 2023

View reviewed changes

cunumeric/linalg/cholesky.py Outdated Show resolved Hide resolved

ipdemes reviewed Aug 17, 2023

View reviewed changes

cunumeric/linalg/cholesky.py Show resolved Hide resolved

ipdemes reviewed Aug 17, 2023

View reviewed changes

cunumeric/linalg/cholesky.py Outdated Show resolved Hide resolved

ipdemes reviewed Aug 17, 2023

View reviewed changes

cunumeric/linalg/cholesky.py Outdated Show resolved Hide resolved

jjwilke force-pushed the batched-cholesky branch from 3106174 to 0f2bc69 Compare August 17, 2023 23:52