-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PaRSEC now allows DSLs to free the gpu task #307
base: master
Are you sure you want to change the base?
Conversation
This doesn't work as thought. PaRSEC releases the task containing the gpu_task structure before the gpu_task is released so we end up with fields overwritten prematurely. |
I take back what I said earlier. The error was somewhere else and not in this PR. Ready for review. |
@@ -4,7 +4,7 @@ | |||
set(TTG_TRACKED_VG_CMAKE_KIT_TAG d1b34157c349cf0a7c2f149b7704a682d53f6486) # provides FindOrFetchLinalgPP and "real" FindOrFetchBoost | |||
set(TTG_TRACKED_CATCH2_VERSION 3.5.0) | |||
set(TTG_TRACKED_MADNESS_TAG 93a9a5cec2a8fa87fba3afe8056607e6062a9058) | |||
set(TTG_TRACKED_PARSEC_TAG 58f8f3089ecad2e8ee50e80a9586e05ce8873b1c) | |||
set(TTG_TRACKED_PARSEC_TAG a9ab33d8287578c68c0349662352f280bc83e2c0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the 4.0 please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many things missing that we need in PaRSEC so that would last for 1 PR:
- Use red-black-tree in zone_malloc ICLDisco/parsec#710
- Use 64bit integer when computing the ordered list pivot ICLDisco/parsec#706
- Provide mechanism to discard data ICLDisco/parsec#695
- Offload device task release to worker threads ICLDisco/parsec#687 (or related)
- Make GPU manager skip records when nothing scheduled on input stream ICLDisco/parsec#681
- Topic/cuda aware communications ICLDisco/parsec#671
Maybe 4.1 will work for us.
tc.out[i] = gpu_task->flow[i]; | ||
/* set up the device task */ | ||
parsec_gpu_task_t *gpu_task = task->dev_ptr->gpu_task; | ||
/* TODO: needed? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not need this, you construct the list_item
and then set the rest of the gpu_task
fields to default values.
parsec_task_class_t& tc = task->dev_ptr->task_class; | ||
|
||
// input flows are set up during register_device_memory as part of the first invocation above | ||
for (int i = 0; i < MAX_PARAM_COUNT; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the upper bound here always MAX_PARAM_COUNT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we don't know how many device inputs the application will give us. We could put a stop there but the impact will be marginal.
We can allocate the GPU task inside the task structure and avoid an extra allocation. Signed-off-by: Joseph Schuchart <[email protected]>
f6c8441
to
2c1323a
Compare
We can allocate the GPU task inside the task structure and avoid an extra allocation.