Skip to content
This repository has been archived by the owner on Dec 24, 2024. It is now read-only.

Adding soft queue dispatch logic to dispatch commands to AIE agents #2

Merged
merged 6 commits into from
Sep 3, 2024

Conversation

eddierichter-amd
Copy link
Collaborator

Logic to dispatch work to AIEs via the HSA interface. Utilizes a soft HSA queue implementation to submit commands to the xdna driver.

@eddierichter-amd eddierichter-amd marked this pull request as ready for review August 29, 2024 21:51
Comment on lines +121 to +122
// Right now can only target N-1 columns so putting this
// here as a workaround
Copy link
Collaborator

@makslevental makslevental Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why can we target only N-1 columns?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline - this is specific to Phoenix (shim DMA missing from 0th col) and needs to be revisited for strix

// Waiting for command to finish
amdxdna_drm_wait_cmd wait_cmd = {};
wait_cmd.hwctx = hw_ctx_handle;
wait_cmd.timeout = 50; // 50ms timeout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this an env variable or something?

Comment on lines 243 to 248
const int operand_starting_index = 5;

// We have 6 arguments of the packet before we start passing operands
// and operands are 64-bits so we need to divide by two
constexpr int non_operand_count = 6;
uint32_t num_operands = (count - non_operand_count) / 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you promote these to ALL_CAPS_CONSTANTS at the top of the file

}

// We know data[2] is the DPU
cmd_pkt_payload->data[2] = 0x04000000 | (reinterpret_cast<uint64_t>(vmem_handle_mappings[cmd_pkt_payload->data[2]]) & 0x02FFFFFF);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you promote 0x04000000 and 0x02FFFFFF to named ALL_CAPS_CONSTANTS at the top of the file.

// Creating the command
amdxdna_drm_create_bo create_cmd_bo = {};
create_cmd_bo.type = AMDXDNA_BO_CMD,
create_cmd_bo.size = 64;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALL_CAPS_CONSTANT at the top of the file

if (ioctl(fd, DRM_IOCTL_AMDXDNA_GET_BO_INFO, &cmd_bo_get_bo_info))
return HSA_STATUS_ERROR;

*cmd = static_cast<amdxdna_cmd *>(mmap(0, 64, PROT_READ | PROT_WRITE, MAP_SHARED, fd, cmd_bo_get_bo_info.map_offset));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the 64 here not create_cmd_bo.size? if so, can you use it, if not can you promote

hsa_status_t AieAqlQueue::SubmitCmd(uint32_t hw_ctx_handle, int fd, void *queue_base, uint64_t read_dispatch_id, uint64_t write_dispatch_id, std::unordered_map<uint32_t, void*> &vmem_handle_mappings) {

// This is the index where the operand addresses start in a command
const int operand_starting_index = 5;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reuse previous promotion

Comment on lines 316 to 321
if (pkt->opcode == HSA_AMD_AIE_ERT_START_CU) {
num_cont_start_cu_pkts++;
}
else {
break;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (pkt->opcode == HSA_AMD_AIE_ERT_START_CU) {
num_cont_start_cu_pkts++;
}
else {
break;
}
if (pkt->opcode != HSA_AMD_AIE_ERT_START_CU) {
break
}
num_cont_start_cu_pkts++;

// beginning
// TODO: Look more into this
if (pkt_iter == cur_id) {
cmd->count = pkt->count + 5;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess the 5 here is operand_starting_index? if so please reuse previous promotion, if not please promote 5 to ALL_CAPS_CONSTANT

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different so moving to a separate constant This is something that I don't fully understand and need to dig a bit deeper into the driver to see why the counts are different. Hence the comment.

// Creating a packet that contains the command chain
uint32_t cmd_chain_bo_handle = 0;
amdxdna_cmd *cmd_chain = nullptr;
if (CreateCmd(4096, &cmd_chain_bo_handle, &cmd_chain, fd))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please promote 4096 to an ALL_CAPS_CONSTANT at the top of the file

// Creating a command chain
cmd_chain->state = HSA_AMD_AIE_ERT_STATE_NEW;
cmd_chain->extra_cu_masks = 0;
cmd_chain->count = 0xA; // TODO: Figure out why this is the value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just fyi these //... that follow a semicolon ; play havoc with clang-format. please shift to just above this line. also any idea where to find this out? somewhere in the FW? or is it the driver?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing the FW would be the quickest to understand why the count is larger than I would have expected.

exec_cmd_0.type = AMDXDNA_CMD_SUBMIT_EXEC_BUF;
exec_cmd_0.cmd_handles = cmd_chain_bo_handle;
exec_cmd_0.args = (__u64)bo_args.data();
exec_cmd_0.cmd_count = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what's the difference between cmd_count and cmd_chain->count?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that cmd_count is the number of chains that you are submitting and cmd_chain->count is the number of commands in a particular chain. Currently the driver and/or FW only supports one command chain at a time as per this comment

}
default: {
return HSA_STATUS_ERROR;
break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
break;

@@ -47,6 +47,7 @@

#include "core/inc/driver.h"
#include "core/inc/memory_region.h"
#include "core/driver/xdna/uapi/amdxdna_accel.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: shouldn't we be getting this from the kernel somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question. I know @atgutier added this to the runtime but not sure how this is usually done. @atgutier what is the preferred way of doing this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for now to ensure core ROCr can at least build on systems that do not have a XRT installed (e.g., the Gerrit test infra currently). Typically, the installer would place this UAPI (user API) header in a known include directory. When using XRT they put this header here: /usr/src/xrt-amdxdna-2.18.0/include/uapi/drm_local/amdxdna_accel.h.

The solution for now is just to keep a copy of this header here for now to avoid issues where we cannot find it installed globally on the system.

The GPU driver interface also directly includes the kfd_ioctl.h header in the runtime for convenience.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this in the kernel now though? https://patchwork.kernel.org/project/dri-devel/cover/[email protected]/

Or meant to be? Admittedly in my 6.10.7 I only have that header in places that XRT would've installed it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the kernel but only for inclusion by the kernel driver. We need it to be installed somewhere accessible by user-mode. So far the only thing that does that is the XRT installer. I confirmed with Max that this is indeed the only way to get the header. I'd prefer not to use that so this is a solution for now.

Eventually, we should get to a point where the driver module installer installs this header.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make a CMake find_package integration to search for the usual locations of the XDNA driver.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need a package to install first. As I said, currently that is only thru XRT which nobody wants to require as a dep here.

uint32_t num_operands = (count - non_operand_count) / 2;

// Keep track of the handles before we submit the packet
bo_args.push_back(cmd_pkt_payload->data[2]); // we know element 2 is the instruction sequence
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: anyway to check this rather than relying on it? i guess not hmm. seems silly but can you promote 2 to an ALL_CAPS_CONSTANT called something like CMD_PKT_PAYLOAD_INSTRUCTION_SEQUENCE_IDX

// Creating a packet that contains the command to execute the kernel
uint32_t cmd_bo_handle = 0;
amdxdna_cmd *cmd = nullptr;
if (CreateCmd(64, &cmd_bo_handle, &cmd, fd))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the same 64 as create_cmd_bo.size up above? if so, can you use the previously promoted/named constant, if not can you promote

Copy link
Collaborator

@makslevental makslevental left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool so this passes now both tests and looks basically fine except for some nits/questions.

@eddierichter-amd
Copy link
Collaborator Author

cool so this passes now both tests and looks basically fine except for some nits/questions.

Awesome! I believe I addressed everything in 025cde5 but let me know if you have additional feedback or if I missed anything.

Copy link
Collaborator

@makslevental makslevental left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool - thanks for fixing the nits. I also fixed some of the build warnings. Gonna set this to auto-merge so I can use it in iree-amd-aie

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants