You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been thinking about ways the C backend could be simplified. I think a lot of complexity comes from trying to handle so many details of the translation in a single pass. By splitting things into multiple passes (operating on an IR, probably LLVM IR), maybe it could be easier to work with.
Something that could be moved to a pass is the handling of vector operations. The LLVM Scalarizer pass can lower most vector operations to simple scalar operations for us, meaning we can remove the handling for vector addition, multiplication etc, leaving just things like generating structs for them, converting GEPs, and a few other things like that.
It's a pretty simple change to use the scalariser:
The main difference in the generated C code is essentially that what would otherwise be the body of a helper function like llvm_fmul_f32x4 instead gets inlined at the call-site.
I'm wondering whether there's a disadvantage to this approach. For a simple matrix multiplication test I wrote, clang seemed to produce similarly good code for the the helper function and non-helper-function versions (i.e. it successfully re-vectorises both). But it might be the case that in a more complex program, switching to scalarisation like this would produce worse code.
Any thoughts?
The text was updated successfully, but these errors were encountered:
I've been thinking about ways the C backend could be simplified. I think a lot of complexity comes from trying to handle so many details of the translation in a single pass. By splitting things into multiple passes (operating on an IR, probably LLVM IR), maybe it could be easier to work with.
Something that could be moved to a pass is the handling of vector operations. The LLVM
Scalarizer
pass can lower most vector operations to simple scalar operations for us, meaning we can remove the handling for vector addition, multiplication etc, leaving just things like generating structs for them, converting GEPs, and a few other things like that.It's a pretty simple change to use the scalariser:
The main difference in the generated C code is essentially that what would otherwise be the body of a helper function like
llvm_fmul_f32x4
instead gets inlined at the call-site.I'm wondering whether there's a disadvantage to this approach. For a simple matrix multiplication test I wrote, clang seemed to produce similarly good code for the the helper function and non-helper-function versions (i.e. it successfully re-vectorises both). But it might be the case that in a more complex program, switching to scalarisation like this would produce worse code.
Any thoughts?
The text was updated successfully, but these errors were encountered: