Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to split one stage into two with hcl.split() #419

Open
antonysigma opened this issue Sep 21, 2021 · 2 comments
Open

How to split one stage into two with hcl.split() #419

antonysigma opened this issue Sep 21, 2021 · 2 comments

Comments

@antonysigma
Copy link

Hi HeteroCL developers,

I came across a similar tutorial in the project Halide-HLS, in which they customized the 2D convolution algorithm by (1) split the large image into tiles in the host (Zynq ARM64), and then (2) send the tiles to the accelerator (Zynq FPGA) to run the convolution steps. The processed tiles are sent back to the host for tile stitching.

Reference:
https://github.com/jingpu/Halide-HLS/blob/905d2f2ad560246673ba3a84b8a6d8be308e481f/apps/hls_examples/gaussian_hls/pipeline.cpp#L103-L107

I wonder how we can describe such a customization with the HeteroCL scheduling syntax, without rewriting the algorithm?

In other words, how do I "split" the stage B into sub-stages tile_producer and tile_consumer, like the following pseudo-code?
Or, should I explicitly describe the sub-stages in order to utilize the hcl.to() syntax?

def one_stage(A):
    B = hcl.compute(A.shape, lambda x, y: A[x, y] + A[x + 1, y] 
                                         + A[x, y + 1] + A[x + 1, y + 1], "B")
    return B

s = hcl.create_schedule([A], one_stage)

# Split the image into tiles of size 2x2
s_B = one_stage.B
x_out, y_out, x_in, y_in = s[s_B].split_to_tiles(s_B.axis[0], s_B.axis[1], 2, 2)

# Define a mock-up target
target = hcl.Platform.zcu102
target.config(compiler="vitis", backend="vhls")

# Implement intermediate stage "tile producer" on the host CPU,
s.to(A, target.host)
s.to(s_B.axis[x_out, y_out], target.host)

# then push the tiles into accelerator
s.to(s_B.axis[x_in, y_in], target.xcel)

# Move the "tile consumer" output back to host CPU for tile stitching
s.to(s_B, target.host) # Not sure how to implement it with HeteroCL
@hecmay
Copy link
Collaborator

hecmay commented Sep 22, 2021

Hi @antonysigma. Thanks for your interest.

In the future release, we will support data movement under a specific loop axis, and you can combine it with loop tiling/reordering to realize the computation you described. To be more concrete, please see the following code example:

def one_stage(A):
    B = hcl.compute(A.shape, lambda x, y: A[x, y] + A[x + 1, y] 
                                         + A[x, y + 1] + A[x + 1, y + 1], "B")
    return B

s = hcl.create_schedule([A], one_stage)

# Define a mock-up target
target = hcl.Platform.zcu102
target.config(compiler="vitis", backend="vhls")

# Split the image into tiles of size 2x2
s_B = one_stage.B
yo, yi, xo, xi = s[s_B].tile(axis=[0,1], factor=[2,2])
s[s_B].reorder([yo, xo, yi, xi])

# Move input from host to FPGA accelerator and
# store the input (tile) under loop axis yi inside a local on-chip buffer
s.to(A, target.xcel).to(s_B, axis=yi)

# Move the output from FPGA to host when the convolution on input tile is done
s.to(s_B, target.host, axis=yi)

In other words, the substages for producing and consuming image tiles would be inferred by HCL compiler automatically based on the information provided by .to() primitive. Right now the master branch of HCL only provides preliminary support for .to() to move the entire tensor between host and accelerator, but we will release a new version of HCL very soon to support this feature. Stay tuned!

@antonysigma
Copy link
Author

Thank you @hecmay for the prompt reply! For sure, I look forward to the data movement customization by the loop axis.

It is also very helpful to see an example code at this stage. When the new feature is delivered on Github, I will be curious about how the order of the following calls influence the data transfer mechanisms.

s.to(s_B, axis=yi).to(s_B, target.host, axis=yi)

s.to(s_B, target.host, axis=yi).to(s_B, axis=yi)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants