-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Pass foffload-fp32-prec-[div/sqrt] options to device's BE #16107
base: sycl
Are you sure you want to change the base?
Conversation
This patch also adds a pass the removes llvm.fpbuiltin.[sqrt/fdiv] intrinsic functions from the module to ensure compatibility with the old drivers (that don't support SPV_INTEL_fp_max_error extension) in case if they are used with standart for OpenCL max-error (e.g [3.0/2.5] ULP) and there are no other llvm.fpbuiltin.* intrinsic functions, fdiv instructions or @sqrt builtins/intrinsics in the module. Signed-off-by: Sidorov, Dmitry <[email protected]>
@@ -1950,9 +1951,18 @@ void SYCLToolChain::AddImpliedTargetArgs(const llvm::Triple &Triple, | |||
if (Args.hasFlag(options::OPT_ftarget_export_symbols, | |||
options::OPT_fno_target_export_symbols, false)) | |||
BeArgs.push_back("-library-compilation"); | |||
} else if (IsJIT) | |||
// -foffload-fp32-prec-[sqrt/div] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdtoguchi please take a look at these few lines, to check if I have correctly figured out what SYCL.cpp does to pass options from FE to BE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like your intention is to pass -options -ze-fp32-correctly-rounded-device-sqrt
to ocloc
and for JIT, pass the respective -foffload-fp32-prec*
option to be embedded in the compile options when the JIT binary is wrapped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - please add a driver test that checks these behaviors.
Signed-off-by: Sidorov, Dmitry <[email protected]>
b840a24
to
5490f22
Compare
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// | ||
// Remove llvm.fpbuiltin.[sqrt/fdiv] intrinsics to ensure compatibility with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmlueck without the deep dive into the pass, may I ask you to check if the logic of the pass described in the comment makes sense to you? Note, I'm not adding annotation of the kernels with some optional kernel feature metadata, that could help discarding 'precise' options from the list of the BE options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason is: currently we either have the intrinsics in the module or don't have them at all. And when we have the - non-precise option was already passed, so there is nothing to rewrite for BE options.
SmallSet<Function *, 2> DeclToRemove; | ||
for (auto *Sqrt : WorkListSqrt) { | ||
DeclToRemove.insert(Sqrt->getCalledFunction()); | ||
IRBuilder Builder(Sqrt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To move outside the loop
This sounds SPIR-V specific problem, so I think the right place to run the pass is SPIR-V generator (i.e. SPIR-V translator or SPIR-V backend). |
This patch also adds a pass that removes llvm.fpbuiltin.[sqrt/fdiv] intrinsics
to ensure compatibility with the old drivers (that don't support SPV_INTEL_fp_max_error extension).
The intrinsic functions are removed in case if they are used with standard
for OpenCL max-error (e.g [3.0/2.5] ULP) and there are no:
The PR depends on #15836 and oneapi-src/unified-runtime#2315