Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2934: Make one CommandQueue and one HW CommandQueue (SysmemWriter) per device #4077

Merged
merged 9 commits into from
Dec 20, 2023

Conversation

abhullar-tt
Copy link
Contributor

@abhullar-tt abhullar-tt commented Nov 29, 2023

Added CommandQueue &detail::GetCommandQueue(Device *device) API which creates a CQ for given device if it doesn't already exist and returns ref to it.

Fast dispatch Enqueue APIs used to accept reference to GLOBAL_CQ but users are expected to pass CQ returned from this API.

Also moved SysmemWriter into tt_metal::Device.

Eventually we can add an API to create command queues to enable multiple SW CQs and for one device all SW CQs will be using the same HW CQ(s)

@abhullar-tt abhullar-tt force-pushed the abhullar/cq-per-device branch 3 times, most recently from cb22adb to 8914d1c Compare December 5, 2023 15:25
@abhullar-tt abhullar-tt changed the title #2934: WIP Make one CommandQueue and one HW CommandQueue (SysmemWriter) per device #2934: Make one CommandQueue and one HW CommandQueue (SysmemWriter) per device Dec 5, 2023
@abhullar-tt abhullar-tt marked this pull request as ready for review December 5, 2023 17:36
@abhullar-tt
Copy link
Contributor Author

Tests added:

DeviceFixture.TestDeviceToHostMemChannelAssignment

  • checks that each device is assigned to unique host memory channel. There is one hugepage per host mem channel

MultiCommandQueueFixture.TestAccessCommandQueue

  • checks that we can access a SW CQ for each device that has been initialized

BasicFastDispatchFixture.TestCannotAccessCommandQueueForClosedDevice

  • opens device -> closes device -> and then ensures we can't access CQ for a closed device

MultiCommandQueueFixture.TestDirectedLoopbackToUniqueHugepage

  • uses cluster APIs to do directed write and reads from the same address across multiple hugepages to ensure each device hugepage is independent

Additional tests for multi-command queue will be incrementally added when we add additional N300 fast dispatch features

@abhullar-tt
Copy link
Contributor Author

@tt-rkim this PR can't be merged to main until we enable creating more than 1 hugepage on our machines

tt_metal/detail/tt_metal.hpp Outdated Show resolved Hide resolved
{
if (detail::GLOBAL_CQ) {
ClearProgramCache(*detail::GLOBAL_CQ);
if (std::getenv("TT_METAL_SLOW_DISPATCH_MODE") == nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious whether we should assert otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these APIs get called in slow dispatch mode atm as well

static std::mutex cq_creation_mutex;
{
std::lock_guard<std::mutex> lock(cq_creation_mutex);
if (not command_queues[id] or (command_queues[id] and command_queues[id]->device != device)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would command_queue[id] ever be initialize and command_queues[id]->device != device?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah so originally I only had CQ being created once but then this was causing a bug where it complained that CQ
was using a device that was closed. This case happens when:

In the same process:
device = CreateDevice(0);
EnqueueX(GetCommandQueue(device), ...) // this will initialize command_queues[device->id]
CloseDevice(device)

device2 = CreateDevice(0); // technically same device but different object
EnqueueX(GetCommandQueue(device2), ...) // command_queues[device2->id] exists because device->id == device2->id but they are diff device objects (device2 is initialized)

this can be cleaned up when we don't need to create device objects but rather give handle to existing device object

@abhullar-tt abhullar-tt force-pushed the abhullar/cq-per-device branch 2 times, most recently from 9f783b4 to b7da114 Compare December 8, 2023 23:42
@abhullar-tt abhullar-tt merged commit 8b8c4be into main Dec 20, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants