-
Notifications
You must be signed in to change notification settings - Fork 0
Adding soft queue dispatch logic to dispatch commands to AIE agents #2
Changes from 2 commits
3dd381f
10bf720
025cde5
72b1ea9
7952927
76e8c5c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -47,6 +47,7 @@ | |
|
||
#include "core/inc/driver.h" | ||
#include "core/inc/memory_region.h" | ||
#include "core/driver/xdna/uapi/amdxdna_accel.h" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. question: shouldn't we be getting this from the kernel somewhere? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is needed for now to ensure core ROCr can at least build on systems that do not have a XRT installed (e.g., the Gerrit test infra currently). Typically, the installer would place this UAPI (user API) header in a known include directory. When using XRT they put this header here: /usr/src/xrt-amdxdna-2.18.0/include/uapi/drm_local/amdxdna_accel.h. The solution for now is just to keep a copy of this header here for now to avoid issues where we cannot find it installed globally on the system. The GPU driver interface also directly includes the kfd_ioctl.h header in the runtime for convenience. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't this in the kernel now though? https://patchwork.kernel.org/project/dri-devel/cover/[email protected]/ Or meant to be? Admittedly in my 6.10.7 I only have that header in places that XRT would've installed it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's in the kernel but only for inclusion by the kernel driver. We need it to be installed somewhere accessible by user-mode. So far the only thing that does that is the XRT installer. I confirmed with Max that this is indeed the only way to get the header. I'd prefer not to use that so this is a solution for now. Eventually, we should get to a point where the driver module installer installs this header. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can make a CMake find_package integration to search for the usual locations of the XDNA driver. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'd need a package to install first. As I said, currently that is only thru XRT which nobody wants to require as a dep here. |
||
|
||
namespace rocr { | ||
namespace core { | ||
|
@@ -69,6 +70,9 @@ class XdnaDriver : public core::Driver { | |
hsa_status_t Init() override; | ||
hsa_status_t QueryKernelModeDriver(core::DriverQuery query) override; | ||
|
||
hsa_status_t GetHandleMappings(std::unordered_map<uint32_t, void*> &vmem_handle_mappings); | ||
hsa_status_t GetFd(int &fd); | ||
|
||
hsa_status_t GetAgentProperties(core::Agent &agent) const override; | ||
hsa_status_t | ||
GetMemoryProperties(uint32_t node_id, | ||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -41,13 +41,16 @@ | |||||||||||||||||||||
//////////////////////////////////////////////////////////////////////////////// | ||||||||||||||||||||||
|
||||||||||||||||||||||
#include "core/inc/amd_aie_aql_queue.h" | ||||||||||||||||||||||
#include "core/inc/amd_xdna_driver.h" | ||||||||||||||||||||||
|
||||||||||||||||||||||
#ifdef __linux__ | ||||||||||||||||||||||
#include <fcntl.h> | ||||||||||||||||||||||
#include <sys/mman.h> | ||||||||||||||||||||||
#include <sys/stat.h> | ||||||||||||||||||||||
#include <sys/syscall.h> | ||||||||||||||||||||||
#include <unistd.h> | ||||||||||||||||||||||
#include <sys/ioctl.h> | ||||||||||||||||||||||
#include <sys/mman.h> | ||||||||||||||||||||||
#endif | ||||||||||||||||||||||
|
||||||||||||||||||||||
#ifdef _WIN32 | ||||||||||||||||||||||
|
@@ -195,8 +198,230 @@ uint64_t AieAqlQueue::AddWriteIndexAcqRel(uint64_t value) { | |||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
void AieAqlQueue::StoreRelaxed(hsa_signal_value_t value) { | ||||||||||||||||||||||
atomic::Store(signal_.hardware_doorbell_ptr, uint64_t(value), | ||||||||||||||||||||||
std::memory_order_release); | ||||||||||||||||||||||
std::unordered_map<uint32_t, void*> vmem_handle_mappings; | ||||||||||||||||||||||
if(static_cast<XdnaDriver&>(core::Runtime::runtime_singleton_->AgentDriver(agent_.driver_type)).GetHandleMappings(vmem_handle_mappings) != HSA_STATUS_SUCCESS) | ||||||||||||||||||||||
return; | ||||||||||||||||||||||
|
||||||||||||||||||||||
int fd = 0; | ||||||||||||||||||||||
if(static_cast<XdnaDriver&>(core::Runtime::runtime_singleton_->AgentDriver(agent_.driver_type)).GetFd(fd) != HSA_STATUS_SUCCESS) | ||||||||||||||||||||||
return; | ||||||||||||||||||||||
|
||||||||||||||||||||||
SubmitCmd(hw_ctx_handle_, fd, amd_queue_.hsa_queue.base_address, amd_queue_.read_dispatch_id, amd_queue_.write_dispatch_id, vmem_handle_mappings); | ||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
hsa_status_t AieAqlQueue::SyncBos(std::vector<uint32_t> &bo_args, int fd) { | ||||||||||||||||||||||
for (int i = 0 ; i < bo_args.size(); i++) { | ||||||||||||||||||||||
amdxdna_drm_sync_bo sync_params = {}; | ||||||||||||||||||||||
sync_params.handle = bo_args[i]; | ||||||||||||||||||||||
if (ioctl(fd, DRM_IOCTL_AMDXDNA_SYNC_BO, &sync_params)) | ||||||||||||||||||||||
return HSA_STATUS_ERROR; | ||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
return HSA_STATUS_SUCCESS; | ||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
hsa_status_t AieAqlQueue::ExecCmdAndWait(amdxdna_drm_exec_cmd *exec_cmd, uint32_t hw_ctx_handle, int fd) { | ||||||||||||||||||||||
// Submit the cmd | ||||||||||||||||||||||
if (ioctl(fd, DRM_IOCTL_AMDXDNA_EXEC_CMD, exec_cmd)) | ||||||||||||||||||||||
return HSA_STATUS_ERROR; | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Waiting for command to finish | ||||||||||||||||||||||
amdxdna_drm_wait_cmd wait_cmd = {}; | ||||||||||||||||||||||
wait_cmd.hwctx = hw_ctx_handle; | ||||||||||||||||||||||
wait_cmd.timeout = 50; // 50ms timeout | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we make this an env variable or something? |
||||||||||||||||||||||
wait_cmd.seq = exec_cmd->seq; | ||||||||||||||||||||||
|
||||||||||||||||||||||
if (ioctl(fd, DRM_IOCTL_AMDXDNA_WAIT_CMD, &wait_cmd)) | ||||||||||||||||||||||
return HSA_STATUS_ERROR; | ||||||||||||||||||||||
|
||||||||||||||||||||||
return HSA_STATUS_SUCCESS; | ||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
void AieAqlQueue::RegisterCmdBOs(uint32_t count, std::vector<uint32_t> &bo_args, hsa_amd_aie_ert_start_kernel_data_t *cmd_pkt_payload, std::unordered_map<uint32_t, void*> &vmem_handle_mappings) { | ||||||||||||||||||||||
|
||||||||||||||||||||||
// This is the index where the operand addresses start in a command | ||||||||||||||||||||||
const int operand_starting_index = 5; | ||||||||||||||||||||||
|
||||||||||||||||||||||
// We have 6 arguments of the packet before we start passing operands | ||||||||||||||||||||||
// and operands are 64-bits so we need to divide by two | ||||||||||||||||||||||
constexpr int non_operand_count = 6; | ||||||||||||||||||||||
uint32_t num_operands = (count - non_operand_count) / 2; | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you promote these to ALL_CAPS_CONSTANTS at the top of the file |
||||||||||||||||||||||
|
||||||||||||||||||||||
// Keep track of the handles before we submit the packet | ||||||||||||||||||||||
bo_args.push_back(cmd_pkt_payload->data[2]); // we know element 2 is the instruction sequence | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. question: anyway to check this rather than relying on it? i guess not hmm. seems silly but can you promote |
||||||||||||||||||||||
|
||||||||||||||||||||||
|
||||||||||||||||||||||
// Going through all of the operands in the command, keeping track of the | ||||||||||||||||||||||
// handles and turning the handles into addresses. The starting index of | ||||||||||||||||||||||
// the operands in a command is `operand_starting_index` and the fields | ||||||||||||||||||||||
// are 32-bits we need to iterate over every two | ||||||||||||||||||||||
for (int operand_iter = 0; operand_iter < num_operands; operand_iter++) { | ||||||||||||||||||||||
bo_args.push_back(cmd_pkt_payload->data[operand_starting_index + 2 * operand_iter]); | ||||||||||||||||||||||
cmd_pkt_payload->data[operand_starting_index + 2 * operand_iter + 1 ] = ((uint64_t)vmem_handle_mappings[cmd_pkt_payload->data[operand_starting_index + 2 * operand_iter]] >> 32) & 0xFFFFFFFF; | ||||||||||||||||||||||
cmd_pkt_payload->data[operand_starting_index + 2 * operand_iter ] = (uint64_t)vmem_handle_mappings[cmd_pkt_payload->data[operand_starting_index + 2 * operand_iter]] & 0xFFFFFFFF; | ||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
// We know data[2] is the DPU | ||||||||||||||||||||||
cmd_pkt_payload->data[2] = 0x04000000 | (reinterpret_cast<uint64_t>(vmem_handle_mappings[cmd_pkt_payload->data[2]]) & 0x02FFFFFF); | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you promote |
||||||||||||||||||||||
|
||||||||||||||||||||||
return; | ||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
hsa_status_t AieAqlQueue::CreateCmd(uint32_t size, uint32_t *handle, amdxdna_cmd **cmd, int fd) { | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Creating the command | ||||||||||||||||||||||
amdxdna_drm_create_bo create_cmd_bo = {}; | ||||||||||||||||||||||
create_cmd_bo.type = AMDXDNA_BO_CMD, | ||||||||||||||||||||||
create_cmd_bo.size = 64; | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ALL_CAPS_CONSTANT at the top of the file |
||||||||||||||||||||||
if (ioctl(fd, DRM_IOCTL_AMDXDNA_CREATE_BO, &create_cmd_bo)) | ||||||||||||||||||||||
return HSA_STATUS_ERROR; | ||||||||||||||||||||||
|
||||||||||||||||||||||
amdxdna_drm_get_bo_info cmd_bo_get_bo_info = {}; | ||||||||||||||||||||||
cmd_bo_get_bo_info.handle = create_cmd_bo.handle; | ||||||||||||||||||||||
if (ioctl(fd, DRM_IOCTL_AMDXDNA_GET_BO_INFO, &cmd_bo_get_bo_info)) | ||||||||||||||||||||||
return HSA_STATUS_ERROR; | ||||||||||||||||||||||
|
||||||||||||||||||||||
*cmd = static_cast<amdxdna_cmd *>(mmap(0, 64, PROT_READ | PROT_WRITE, MAP_SHARED, fd, cmd_bo_get_bo_info.map_offset)); | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is the |
||||||||||||||||||||||
*handle = create_cmd_bo.handle; | ||||||||||||||||||||||
|
||||||||||||||||||||||
return HSA_STATUS_SUCCESS; | ||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
hsa_status_t AieAqlQueue::SubmitCmd(uint32_t hw_ctx_handle, int fd, void *queue_base, uint64_t read_dispatch_id, uint64_t write_dispatch_id, std::unordered_map<uint32_t, void*> &vmem_handle_mappings) { | ||||||||||||||||||||||
|
||||||||||||||||||||||
// This is the index where the operand addresses start in a command | ||||||||||||||||||||||
const int operand_starting_index = 5; | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. reuse previous promotion |
||||||||||||||||||||||
|
||||||||||||||||||||||
uint64_t cur_id = read_dispatch_id; | ||||||||||||||||||||||
while (cur_id < write_dispatch_id) { | ||||||||||||||||||||||
|
||||||||||||||||||||||
hsa_amd_aie_ert_packet_t *pkt = static_cast<hsa_amd_aie_ert_packet_t *>(queue_base) + cur_id; | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Get the packet header information | ||||||||||||||||||||||
if (pkt->header.header != HSA_PACKET_TYPE_VENDOR_SPECIFIC || pkt->header.AmdFormat != HSA_AMD_PACKET_TYPE_AIE_ERT) | ||||||||||||||||||||||
return HSA_STATUS_ERROR; | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Get the payload information | ||||||||||||||||||||||
switch (pkt->opcode) { | ||||||||||||||||||||||
case HSA_AMD_AIE_ERT_START_CU: { | ||||||||||||||||||||||
|
||||||||||||||||||||||
std::vector<uint32_t> bo_args; | ||||||||||||||||||||||
std::vector<uint32_t> cmd_handles; | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Iterating over future packets and seeing how many contigous HSA_AMD_AIE_ERT_START_CU | ||||||||||||||||||||||
// packets there are. All can be combined into a single chain. | ||||||||||||||||||||||
int num_cont_start_cu_pkts = 1; | ||||||||||||||||||||||
for (int peak_pkt_id = cur_id + 1; peak_pkt_id < write_dispatch_id; peak_pkt_id++) { | ||||||||||||||||||||||
hsa_amd_aie_ert_packet_t *peak_pkt = static_cast<hsa_amd_aie_ert_packet_t *>(queue_base) + peak_pkt_id; | ||||||||||||||||||||||
if (pkt->opcode == HSA_AMD_AIE_ERT_START_CU) { | ||||||||||||||||||||||
num_cont_start_cu_pkts++; | ||||||||||||||||||||||
} | ||||||||||||||||||||||
else { | ||||||||||||||||||||||
break; | ||||||||||||||||||||||
} | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Iterating over all of the contigous HSA_AMD_AIE_ERT_CMD_CHAIN packets | ||||||||||||||||||||||
for (int pkt_iter = cur_id; pkt_iter < cur_id + num_cont_start_cu_pkts; pkt_iter++) { | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Getting the current command packet | ||||||||||||||||||||||
hsa_amd_aie_ert_packet_t *pkt = static_cast<hsa_amd_aie_ert_packet_t *>(queue_base) + pkt_iter; | ||||||||||||||||||||||
hsa_amd_aie_ert_start_kernel_data_t *cmd_pkt_payload = reinterpret_cast<hsa_amd_aie_ert_start_kernel_data_t *>(pkt->payload_data); | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Add the handles for all of the BOs to bo_args as well as rewrite the command | ||||||||||||||||||||||
// payload handles to contain the actual virtual addresses | ||||||||||||||||||||||
RegisterCmdBOs(pkt->count, bo_args, cmd_pkt_payload, vmem_handle_mappings); | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Creating a packet that contains the command to execute the kernel | ||||||||||||||||||||||
uint32_t cmd_bo_handle = 0; | ||||||||||||||||||||||
amdxdna_cmd *cmd = nullptr; | ||||||||||||||||||||||
if (CreateCmd(64, &cmd_bo_handle, &cmd, fd)) | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this the same |
||||||||||||||||||||||
return HSA_STATUS_ERROR; | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Filling in the fields of the command | ||||||||||||||||||||||
cmd->state = pkt->state; | ||||||||||||||||||||||
cmd->extra_cu_masks = 0; | ||||||||||||||||||||||
|
||||||||||||||||||||||
// For some reason the first count needs to be a little larger than | ||||||||||||||||||||||
// it actually is, assuming there is some other data structure at the | ||||||||||||||||||||||
// beginning | ||||||||||||||||||||||
// TODO: Look more into this | ||||||||||||||||||||||
if (pkt_iter == cur_id) { | ||||||||||||||||||||||
cmd->count = pkt->count + 5; | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i guess the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Different so moving to a separate constant This is something that I don't fully understand and need to dig a bit deeper into the driver to see why the counts are different. Hence the comment. |
||||||||||||||||||||||
} | ||||||||||||||||||||||
else { | ||||||||||||||||||||||
cmd->count = pkt->count; | ||||||||||||||||||||||
} | ||||||||||||||||||||||
cmd->opcode = pkt->opcode; | ||||||||||||||||||||||
cmd->data[0] = cmd_pkt_payload->cu_mask; | ||||||||||||||||||||||
memcpy((cmd->data + 1), cmd_pkt_payload->data, 4 * pkt->count); | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Keeping track of the handle | ||||||||||||||||||||||
cmd_handles.push_back(cmd_bo_handle); | ||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Creating a packet that contains the command chain | ||||||||||||||||||||||
uint32_t cmd_chain_bo_handle = 0; | ||||||||||||||||||||||
amdxdna_cmd *cmd_chain = nullptr; | ||||||||||||||||||||||
if (CreateCmd(4096, &cmd_chain_bo_handle, &cmd_chain, fd)) | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please promote |
||||||||||||||||||||||
return HSA_STATUS_ERROR; | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Writing information to the command buffer | ||||||||||||||||||||||
amdxdna_cmd_chain *cmd_chain_payload = reinterpret_cast<amdxdna_cmd_chain *>(cmd_chain->data); | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Creating a command chain | ||||||||||||||||||||||
cmd_chain->state = HSA_AMD_AIE_ERT_STATE_NEW; | ||||||||||||||||||||||
cmd_chain->extra_cu_masks = 0; | ||||||||||||||||||||||
cmd_chain->count = 0xA; // TODO: Figure out why this is the value | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just fyi these There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am guessing the FW would be the quickest to understand why the count is larger than I would have expected. |
||||||||||||||||||||||
cmd_chain->opcode = HSA_AMD_AIE_ERT_CMD_CHAIN; | ||||||||||||||||||||||
cmd_chain_payload->command_count = cmd_handles.size(); | ||||||||||||||||||||||
cmd_chain_payload->submit_index = 0; | ||||||||||||||||||||||
cmd_chain_payload->error_index = 0; | ||||||||||||||||||||||
for (int i = 0; i < cmd_handles.size(); i++) { | ||||||||||||||||||||||
cmd_chain_payload->data[i] = cmd_handles[i]; | ||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Syncing BOs before we execute the command | ||||||||||||||||||||||
if (SyncBos(bo_args, fd)) | ||||||||||||||||||||||
return HSA_STATUS_ERROR; | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Removing duplicates in the bo container. The driver will report | ||||||||||||||||||||||
// an error if we provide the same BO handle multiple times. | ||||||||||||||||||||||
// This can happen if any of the BOs are the same across jobs | ||||||||||||||||||||||
std::sort(bo_args.begin(), bo_args.end()); | ||||||||||||||||||||||
bo_args.erase(std::unique(bo_args.begin(), bo_args.end()), bo_args.end()); | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Filling in the fields to execute the command chain | ||||||||||||||||||||||
amdxdna_drm_exec_cmd exec_cmd_0 = {}; | ||||||||||||||||||||||
exec_cmd_0.ext = 0; | ||||||||||||||||||||||
exec_cmd_0.ext_flags = 0; | ||||||||||||||||||||||
exec_cmd_0.hwctx = hw_ctx_handle; | ||||||||||||||||||||||
exec_cmd_0.type = AMDXDNA_CMD_SUBMIT_EXEC_BUF; | ||||||||||||||||||||||
exec_cmd_0.cmd_handles = cmd_chain_bo_handle; | ||||||||||||||||||||||
exec_cmd_0.args = (__u64)bo_args.data(); | ||||||||||||||||||||||
exec_cmd_0.cmd_count = 1; | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. question: what's the difference between There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding is that |
||||||||||||||||||||||
exec_cmd_0.arg_count = bo_args.size(); | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Executing all commands in the command chain | ||||||||||||||||||||||
ExecCmdAndWait(&exec_cmd_0, hw_ctx_handle, fd); | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Syncing BOs after we execute the command | ||||||||||||||||||||||
if (SyncBos(bo_args, fd)) | ||||||||||||||||||||||
return HSA_STATUS_ERROR; | ||||||||||||||||||||||
|
||||||||||||||||||||||
cur_id += num_cont_start_cu_pkts; | ||||||||||||||||||||||
break; | ||||||||||||||||||||||
} | ||||||||||||||||||||||
default: { | ||||||||||||||||||||||
return HSA_STATUS_ERROR; | ||||||||||||||||||||||
break; | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
return HSA_STATUS_SUCCESS; | ||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
void AieAqlQueue::StoreRelease(hsa_signal_value_t value) { | ||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: why can we target only N-1 columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline - this is specific to Phoenix (shim DMA missing from 0th col) and needs to be revisited for strix