-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add batched cholesky implementation and tests #1029
Conversation
3106174
to
0f2bc69
Compare
/*static*/ void BatchedCholeskyTask::cpu_variant(TaskContext& context) | ||
{ | ||
#ifdef LEGATE_USE_OPENMP | ||
openblas_set_num_threads(1); // make sure this isn't overzealous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this even if we don't call any openmp pragmas inside of the cpu variant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm, not sure. I copied this straight from potrf.cc, which was @magnatelee who usually has good reasons for including things : ) I would say leave for now until we get clarification? I don't think it's hurting anything, but I also don't understand why it would be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will keep this comment open so we don't loose it and can ask @magnatelee when he is back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if one program doesn't use a cpu task and an openmp task, both calling openblas, this isn't strictly necessary. but we would never know, so to be absolutely safe, any persistent states like the number of openmp threads for openblas should be set by each task to make sure they match the assumption of the task.
31bae33
to
1967204
Compare
@ipdemes This is probably ready to go? The one failing test is an RNG test unrelated to cholesky. Alternatively, we could wait until next week before merging for @manopapad and @magnatelee to give the green light on the implementation. |
cd23e8d
to
a176d93
Compare
a176d93
to
1aedbdf
Compare
/ok to test |
For specific matrix sizes in which the (n x n) submatrix fits on a single GPU, allow a batched cholesky implementation that calls POTRF on all the submatrices and parallelizes across the outer indices.