-
Notifications
You must be signed in to change notification settings - Fork 0
Adding soft queue dispatch logic to dispatch commands to AIE agents #2
Conversation
53faf46
to
773afe8
Compare
9bc697b
to
3dd381f
Compare
// Right now can only target N-1 columns so putting this | ||
// here as a workaround |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: why can we target only N-1 columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline - this is specific to Phoenix (shim DMA missing from 0th col) and needs to be revisited for strix
// Waiting for command to finish | ||
amdxdna_drm_wait_cmd wait_cmd = {}; | ||
wait_cmd.hwctx = hw_ctx_handle; | ||
wait_cmd.timeout = 50; // 50ms timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make this an env variable or something?
const int operand_starting_index = 5; | ||
|
||
// We have 6 arguments of the packet before we start passing operands | ||
// and operands are 64-bits so we need to divide by two | ||
constexpr int non_operand_count = 6; | ||
uint32_t num_operands = (count - non_operand_count) / 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you promote these to ALL_CAPS_CONSTANTS at the top of the file
} | ||
|
||
// We know data[2] is the DPU | ||
cmd_pkt_payload->data[2] = 0x04000000 | (reinterpret_cast<uint64_t>(vmem_handle_mappings[cmd_pkt_payload->data[2]]) & 0x02FFFFFF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you promote 0x04000000
and 0x02FFFFFF
to named ALL_CAPS_CONSTANTS at the top of the file.
// Creating the command | ||
amdxdna_drm_create_bo create_cmd_bo = {}; | ||
create_cmd_bo.type = AMDXDNA_BO_CMD, | ||
create_cmd_bo.size = 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALL_CAPS_CONSTANT at the top of the file
if (ioctl(fd, DRM_IOCTL_AMDXDNA_GET_BO_INFO, &cmd_bo_get_bo_info)) | ||
return HSA_STATUS_ERROR; | ||
|
||
*cmd = static_cast<amdxdna_cmd *>(mmap(0, 64, PROT_READ | PROT_WRITE, MAP_SHARED, fd, cmd_bo_get_bo_info.map_offset)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the 64
here not create_cmd_bo.size
? if so, can you use it, if not can you promote
hsa_status_t AieAqlQueue::SubmitCmd(uint32_t hw_ctx_handle, int fd, void *queue_base, uint64_t read_dispatch_id, uint64_t write_dispatch_id, std::unordered_map<uint32_t, void*> &vmem_handle_mappings) { | ||
|
||
// This is the index where the operand addresses start in a command | ||
const int operand_starting_index = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reuse previous promotion
if (pkt->opcode == HSA_AMD_AIE_ERT_START_CU) { | ||
num_cont_start_cu_pkts++; | ||
} | ||
else { | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (pkt->opcode == HSA_AMD_AIE_ERT_START_CU) { | |
num_cont_start_cu_pkts++; | |
} | |
else { | |
break; | |
} | |
if (pkt->opcode != HSA_AMD_AIE_ERT_START_CU) { | |
break | |
} | |
num_cont_start_cu_pkts++; |
// beginning | ||
// TODO: Look more into this | ||
if (pkt_iter == cur_id) { | ||
cmd->count = pkt->count + 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess the 5
here is operand_starting_index
? if so please reuse previous promotion, if not please promote 5
to ALL_CAPS_CONSTANT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different so moving to a separate constant This is something that I don't fully understand and need to dig a bit deeper into the driver to see why the counts are different. Hence the comment.
// Creating a packet that contains the command chain | ||
uint32_t cmd_chain_bo_handle = 0; | ||
amdxdna_cmd *cmd_chain = nullptr; | ||
if (CreateCmd(4096, &cmd_chain_bo_handle, &cmd_chain, fd)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please promote 4096
to an ALL_CAPS_CONSTANT at the top of the file
// Creating a command chain | ||
cmd_chain->state = HSA_AMD_AIE_ERT_STATE_NEW; | ||
cmd_chain->extra_cu_masks = 0; | ||
cmd_chain->count = 0xA; // TODO: Figure out why this is the value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just fyi these //...
that follow a semicolon ;
play havoc with clang-format
. please shift to just above this line. also any idea where to find this out? somewhere in the FW? or is it the driver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am guessing the FW would be the quickest to understand why the count is larger than I would have expected.
exec_cmd_0.type = AMDXDNA_CMD_SUBMIT_EXEC_BUF; | ||
exec_cmd_0.cmd_handles = cmd_chain_bo_handle; | ||
exec_cmd_0.args = (__u64)bo_args.data(); | ||
exec_cmd_0.cmd_count = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: what's the difference between cmd_count
and cmd_chain->count
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that cmd_count
is the number of chains that you are submitting and cmd_chain->count
is the number of commands in a particular chain. Currently the driver and/or FW only supports one command chain at a time as per this comment
} | ||
default: { | ||
return HSA_STATUS_ERROR; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break; |
@@ -47,6 +47,7 @@ | |||
|
|||
#include "core/inc/driver.h" | |||
#include "core/inc/memory_region.h" | |||
#include "core/driver/xdna/uapi/amdxdna_accel.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: shouldn't we be getting this from the kernel somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed for now to ensure core ROCr can at least build on systems that do not have a XRT installed (e.g., the Gerrit test infra currently). Typically, the installer would place this UAPI (user API) header in a known include directory. When using XRT they put this header here: /usr/src/xrt-amdxdna-2.18.0/include/uapi/drm_local/amdxdna_accel.h.
The solution for now is just to keep a copy of this header here for now to avoid issues where we cannot find it installed globally on the system.
The GPU driver interface also directly includes the kfd_ioctl.h header in the runtime for convenience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this in the kernel now though? https://patchwork.kernel.org/project/dri-devel/cover/[email protected]/
Or meant to be? Admittedly in my 6.10.7 I only have that header in places that XRT would've installed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the kernel but only for inclusion by the kernel driver. We need it to be installed somewhere accessible by user-mode. So far the only thing that does that is the XRT installer. I confirmed with Max that this is indeed the only way to get the header. I'd prefer not to use that so this is a solution for now.
Eventually, we should get to a point where the driver module installer installs this header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make a CMake find_package integration to search for the usual locations of the XDNA driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need a package to install first. As I said, currently that is only thru XRT which nobody wants to require as a dep here.
uint32_t num_operands = (count - non_operand_count) / 2; | ||
|
||
// Keep track of the handles before we submit the packet | ||
bo_args.push_back(cmd_pkt_payload->data[2]); // we know element 2 is the instruction sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: anyway to check this rather than relying on it? i guess not hmm. seems silly but can you promote 2
to an ALL_CAPS_CONSTANT called something like CMD_PKT_PAYLOAD_INSTRUCTION_SEQUENCE_IDX
// Creating a packet that contains the command to execute the kernel | ||
uint32_t cmd_bo_handle = 0; | ||
amdxdna_cmd *cmd = nullptr; | ||
if (CreateCmd(64, &cmd_bo_handle, &cmd, fd)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the same 64
as create_cmd_bo.size
up above? if so, can you use the previously promoted/named constant, if not can you promote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool so this passes now both tests and looks basically fine except for some nits/questions.
Awesome! I believe I addressed everything in 025cde5 but let me know if you have additional feedback or if I missed anything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool - thanks for fixing the nits. I also fixed some of the build warnings. Gonna set this to auto-merge so I can use it in iree-amd-aie
62ee1c2
to
76e8c5c
Compare
Logic to dispatch work to AIEs via the HSA interface. Utilizes a soft HSA queue implementation to submit commands to the xdna driver.