Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Pass foffload-fp32-prec-[div/sqrt] options to device's BE #16107

Draft
wants to merge 2 commits into
base: sycl
Choose a base branch
from

Conversation

MrSidims
Copy link
Contributor

@MrSidims MrSidims commented Nov 18, 2024

This patch also adds a pass that removes llvm.fpbuiltin.[sqrt/fdiv] intrinsics
to ensure compatibility with the old drivers (that don't support SPV_INTEL_fp_max_error extension).
The intrinsic functions are removed in case if they are used with standard
for OpenCL max-error (e.g [3.0/2.5] ULP) and there are no:

  • other llvm.fpbuiltin.* intrinsic functions;
  • fdiv instructions
  • sqrt builtins (both C and C++-styles)/llvm intrinsic in the module.

The PR depends on #15836 and oneapi-src/unified-runtime#2315

This patch also adds a pass the removes llvm.fpbuiltin.[sqrt/fdiv]
intrinsic functions from the module to ensure compatibility with the old drivers
(that don't support SPV_INTEL_fp_max_error extension) in case if they are used
with standart for OpenCL max-error (e.g [3.0/2.5] ULP) and there are no other
llvm.fpbuiltin.* intrinsic functions, fdiv instructions or @sqrt
builtins/intrinsics in the module.

Signed-off-by: Sidorov, Dmitry <[email protected]>
@@ -1950,9 +1951,18 @@ void SYCLToolChain::AddImpliedTargetArgs(const llvm::Triple &Triple,
if (Args.hasFlag(options::OPT_ftarget_export_symbols,
options::OPT_fno_target_export_symbols, false))
BeArgs.push_back("-library-compilation");
} else if (IsJIT)
// -foffload-fp32-prec-[sqrt/div]
Copy link
Contributor Author

@MrSidims MrSidims Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdtoguchi please take a look at these few lines, to check if I have correctly figured out what SYCL.cpp does to pass options from FE to BE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like your intention is to pass -options -ze-fp32-correctly-rounded-device-sqrt to ocloc and for JIT, pass the respective -foffload-fp32-prec* option to be embedded in the compile options when the JIT binary is wrapped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - please add a driver test that checks these behaviors.

Signed-off-by: Sidorov, Dmitry <[email protected]>
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
// Remove llvm.fpbuiltin.[sqrt/fdiv] intrinsics to ensure compatibility with the
Copy link
Contributor Author

@MrSidims MrSidims Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmlueck without the deep dive into the pass, may I ask you to check if the logic of the pass described in the comment makes sense to you? Note, I'm not adding annotation of the kernels with some optional kernel feature metadata, that could help discarding 'precise' options from the list of the BE options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason is: currently we either have the intrinsics in the module or don't have them at all. And when we have the - non-precise option was already passed, so there is nothing to rewrite for BE options.

SmallSet<Function *, 2> DeclToRemove;
for (auto *Sqrt : WorkListSqrt) {
DeclToRemove.insert(Sqrt->getCalledFunction());
IRBuilder Builder(Sqrt);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To move outside the loop

@bader
Copy link
Contributor

bader commented Nov 25, 2024

This patch also adds a pass that removes llvm.fpbuiltin.[sqrt/fdiv] intrinsics
to ensure compatibility with the old drivers (that don't support SPV_INTEL_fp_max_error extension).

This sounds SPIR-V specific problem, so I think the right place to run the pass is SPIR-V generator (i.e. SPIR-V translator or SPIR-V backend).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants