-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#2934: Make one CommandQueue and one HW CommandQueue (SysmemWriter) per device #4077
Conversation
cb22adb
to
8914d1c
Compare
8914d1c
to
8fc6fe3
Compare
Tests added: DeviceFixture.TestDeviceToHostMemChannelAssignment
MultiCommandQueueFixture.TestAccessCommandQueue
BasicFastDispatchFixture.TestCannotAccessCommandQueueForClosedDevice
MultiCommandQueueFixture.TestDirectedLoopbackToUniqueHugepage
Additional tests for multi-command queue will be incrementally added when we add additional N300 fast dispatch features |
8fc6fe3
to
9fc4f0e
Compare
tests/tt_metal/tt_metal/unit_tests/compute/matmul/single_core_matmul_compute.cpp
Show resolved
Hide resolved
tests/tt_metal/tt_metal/unit_tests_fast_dispatch/command_queue/test_CommandQueue.cpp
Show resolved
Hide resolved
tests/tt_metal/tt_metal/unit_tests_fast_dispatch/common/command_queue_fixture.hpp
Outdated
Show resolved
Hide resolved
@tt-rkim this PR can't be merged to main until we enable creating more than 1 hugepage on our machines |
{ | ||
if (detail::GLOBAL_CQ) { | ||
ClearProgramCache(*detail::GLOBAL_CQ); | ||
if (std::getenv("TT_METAL_SLOW_DISPATCH_MODE") == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious whether we should assert otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these APIs get called in slow dispatch mode atm as well
static std::mutex cq_creation_mutex; | ||
{ | ||
std::lock_guard<std::mutex> lock(cq_creation_mutex); | ||
if (not command_queues[id] or (command_queues[id] and command_queues[id]->device != device)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would command_queue[id] ever be initialize and command_queues[id]->device != device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah so originally I only had CQ being created once but then this was causing a bug where it complained that CQ
was using a device that was closed. This case happens when:
In the same process:
device = CreateDevice(0);
EnqueueX(GetCommandQueue(device), ...) // this will initialize command_queues[device->id]
CloseDevice(device)
device2 = CreateDevice(0); // technically same device but different object
EnqueueX(GetCommandQueue(device2), ...) // command_queues[device2->id] exists because device->id == device2->id but they are diff device objects (device2 is initialized)
this can be cleaned up when we don't need to create device objects but rather give handle to existing device object
4d34b98
to
f04461f
Compare
9f783b4
to
b7da114
Compare
…smem writer (HW CQ) to be owned by Device assign each device to specific host mem channel
…e number of chips on a card and run multi cq device fixture on GS
…that are upgrading
…eing set and multiply it by 4 for multi CQ (cherry picked from commit 226b550)
…es to try fixing OOM memory error for frequent pipeline on GS
8a58161
to
60cf9a1
Compare
Added
CommandQueue &detail::GetCommandQueue(Device *device)
API which creates a CQ for given device if it doesn't already exist and returns ref to it.Fast dispatch Enqueue APIs used to accept reference to GLOBAL_CQ but users are expected to pass CQ returned from this API.
Also moved SysmemWriter into tt_metal::Device.
Eventually we can add an API to create command queues to enable multiple SW CQs and for one device all SW CQs will be using the same HW CQ(s)