-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Galaxy support. #9068
Add Galaxy support. #9068
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything of major concern, there are a few items that we should address in a follow on commit (if not w/ this commit). Eventually I think we should pull a bunch of cq config code in device.cpp out and put it in the cq code, though we need to think about the architecture of device/cq a bit more
} | ||
log_debug(tt::LogMetal, "Setting up {} Arguments", magic_enum::enum_name((tt::tt_metal::DispatchWorkerType)dwv)); | ||
switch(dwv) { | ||
case PREFETCH: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems these names should be scoped?
settings.downstream_cores.push_back(mux_settings.worker_physical_core); | ||
settings.compile_args.resize(23); | ||
auto& compile_args = settings.compile_args; | ||
compile_args[0] = downstream_cb_base; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as is is good enough for now, eventually we should name the arg indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree, the indices should be named and both the host code and device kernel code should access the compile args through named indices.
compile_args[24] = packet_switch_4B_pack(0xB1, 0xB2, 0xB3, 0xB4); // 24: packetized input dest id | ||
break; | ||
} | ||
case US_TUNNELER_REMOTE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
US ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up Stream Tunneler.
Its a tunneler running on an inner device in the tunnel and is connected to next tunneled device going away from host.
auto &compile_args = demux_d_settings.compile_args; | ||
compile_args.resize(30); | ||
|
||
compile_args[0] = 0xB1; // 0: endpoint_id_start_index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as part of the next cleanup pass, this and other constants should be named and declared in one location I think...
auto demux_d_settings = std::get<1>(device_worker_variants[DEMUX_D][0]); | ||
auto dispatch_d_settings = std::get<1>(device_worker_variants[DISPATCH_D][0]); | ||
|
||
TT_ASSERT(num_prefetchers == demux_d_settings.semaphores.size(), "Demux D does not have required number of semaphores for Prefetcher D. Exptected = {}. Fount = {}", num_prefetchers, demux_d_settings.semaphores.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fount typo
COUNT = 12 | ||
}; | ||
|
||
struct worker_build_settings_t{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quibble: not a fan of this name. dispatch_worker_build_settings_t ? or dispatch_settings_t? or cq?
@@ -583,6 +691,71 @@ void Cluster::initialize_ethernet_sockets() { | |||
} | |||
} | |||
|
|||
void Cluster::reserve_ethernet_cores_for_tunneling() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few high level comments in front of the bigger routines would help w/ understanding the flow: what do you have to search through? what criteria is being looked for
#8305: add Galaxy cluster apis #8305: cleanup, add print #8450: Establish tunnels originating from an mmio device. Determine the remote chips as well as their order on the tunnel. #8452: add tests for tg pipeline #0: patch for tg workflows. #8450: Add tables for tunnel dispatch workers with build settings. Populate build settings for tunnel kernels. Launch FD2 kernels based on information in tunnel device dispatch worker map. Enable 4 devices per hugepage/channel #0: disable hanging/failing tests for Galaxy #0: skip using channel 3, 7 which use huge page channel 3. This (4th) huepage is not a full 1GB in size. 256 MB is taken up by syseng tools 4th huge page. #0: re-enable Galaxy sharded tests, reduce one test runtime for Galaxy #0: fix cluster init for Galaxy #8953: Fix hardcoding of queue sizes in tests. #8450: Fix compute grid selection for N150. N150 can be standalone system or part of a TG system. On TG compute grid for N150 is different than standalone N150. #0: Reduce prefetch q entries to account for Galaxy CQ size. #0: galaxy mesh return any available device #0: Fix device mesh close for Galaxy #8450: Update Galaxy device creation.
#0: Enable metal on galaxy.
#8305: add Galaxy cluster apis
#8305: cleanup, add print
#8450: Establish tunnels originating from an mmio device. Determine the remote chips as well as their order on the tunnel. #8452: add tests for tg pipeline
#0: patch for tg workflows.
#8450: Add tables for tunnel dispatch workers with build settings.
Populate build settings for tunnel kernels.
Launch FD2 kernels based on information in tunnel device dispatch worker map.
Enable 4 devices per hugepage/channel
#0: disable hanging/failing tests for Galaxy
#0: skip using channel 3, 7 which use huge page channel 3. This (4th) huepage is not a full 1GB in size. 256 MB is taken up by syseng tools 4th huge page.
#0: re-enable Galaxy sharded tests, reduce one test runtime for Galaxy
#0: fix cluster init for Galaxy
#8953: Fix hardcoding of queue sizes in tests.
#8450: Fix compute grid selection for N150. N150 can be standalone system or part of a TG system. On TG compute grid for N150 is different than standalone N150.
#0: Reduce prefetch q entries to account for Galaxy CQ size.