-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Schedule] Support for intra-kernel data placement #436
base: tvm
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. I’ve looked through the code and think maybe you could add more descriptions for this PR. Seems you have added several new features besides the AutoSA backend.
- I notice you introduced new APIs like
transpose
andpack
, and new passes liketransform_layout
andexplicit_unroll
, could you also describe the changes in this PR? - Just a small question: You are not writing a C++ codegen for AutoSA right? All the compilation happens at the Python level (except for some transformation passes).
@@ -26,5 +26,5 @@ jobs: | |||
source $VITIS/settings64.sh | |||
source /opt/xilinx/xrt/setup.sh | |||
export LOCAL_CI_TEST=1 | |||
which vivado_hls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vivado_hls
has been included in the previous paths?
def indent(num): | ||
return " " * num | ||
|
||
def get_function_code(name, code): | ||
pos = code.find(name) | ||
start_pos = pos - len("inline void") | ||
end_pos = code.find("/* Helper", pos) | ||
return code[start_pos:end_pos] | ||
|
||
|
||
def get_ser_size(code): | ||
lines = code.split("\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what Python formatter HeteroCL uses, but mixing one-line space and two-line space seems weird.
PART = "10,16" | ||
LAT = "2,2" | ||
SIMD = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these magic numbers? Could you add comments or use more specific variable names here?
print(f"[ INFO ] input size OC({OC}), OH({OH}), OW({OW}), IC({IC}), R({R}), C({C})") | ||
PART = "16,13,13,1" | ||
LAT = "2,1,2" | ||
SIMD = "1,1,2,4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "16,13,13,1"? I suppose this is not a test file but a general implementation.
@@ -1,6 +1,6 @@ | |||
/*! | |||
* Copyright (c) 2019 by Contributors | |||
* \file adjust_buffer_binding.cc | |||
* \file loop_partition.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you changed the header by mistake? The file name remains the same.
@@ -265,6 +265,61 @@ def join(self, srcs, dest=None): | |||
"inconsistent tensor joining" | |||
self.sch.join(target, dest, self[src]) | |||
|
|||
def transpose(self, tensor=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is one in compute_api.py
. What's the difference between these two transpose
?
return Y | ||
|
||
# Note that you have to make sure AutoSA binary | ||
# in on the PATH by running which command, otherwise HCL runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in on" typo. Better add quotation marks for which
.
extra_flags = "--simd-info=./autosa_tests/cnn/simd_info.json " | ||
return ST, PART, LAT, SIMD, extra_flags | ||
|
||
def generate_systolic_array(keys, values, code, backend): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the codegen, copying files, and generating headers are done in this function? Maybe it would be better if this function can be separated into several subfunctions or several steps like what we did in runtime.py
.
Thanks for pointing that out.
|
|
||
self.cascade_tensor = tensor | ||
self.cascade_source_stage = None | ||
self.sch.transpose(src, tensor, new_shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about this self.sch.transpose
function: is data packing actually done by this function? It seems to me that this pack
function only calculates the new shape.
if (top_arg_names_.find(var_name) != top_arg_names_.end()) { | ||
placement_info += "[0]"; // located on off-chip memory | ||
} else { | ||
placement_info += "[1]"; // loacted on on-chip memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few general questions about the newly added passes:
- From the test case, we are wrapping a piece of imperative gemm code into a stage and calls
systolic()
on the stage. Do we have any checks to see if the imperative code can be mapped to AutoSA? Or AutoSA would complain if it can't map the algorithm? - I want to check if my understanding is correct. For a piece of code that is targeted to AutoSA backend, we first generate C code from it, and then calls AutoSA to generate systolic array HLS code + serialization/de-serialization code, which is then wrapped into a stage. Is that correct?
- About the "explicit unroll" pass, is it unrolling a loop and then outline the loop body to become PEs (function calls)?
- What does "transform layout" do?
@@ -380,6 +380,9 @@ def lower(sch, | |||
stmt = ir_pass.AdjustBufferBinding(stmt, arg_list) | |||
stmt = ir_pass.InferStream(stmt, arg_list) | |||
stmt = ir_pass.AdjustBufferBinding(stmt, arg_list) | |||
# perform layout transformation | |||
stmt = ir_pass.TransformLayout(stmt, arg_list) | |||
stmt = ir_pass.AdjustBufferBinding(stmt, arg_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does AdjustBufferBinding do? Why is it called multiple times after each pass?
This PR aims to enhance
.systolic()
and.to()
primitive to better support intra-kernel data placement for systolic array generation using AutoSA backend..systolic()
primitive is a push-button API that maps the compute kernel to a systolic array automatically (while the dataflow pattern is left to compiler's decision)..to()
primitive provides more flexibility for expert designers to explore the trade-offs of different systolic dataflows.I have successfully solved the dependency issues and installed AutoSA on our local server. In this PR, i will also add the CI/CD local testing for systolic array programs with AutoSA backend.