[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

zahiraam · 2024-10-23T15:39:51Z

Add support for options -f[no]-offload-fp32-prec-div and -f[no-]-offload-fp32-prec-sqrt.
These options are added to allow users to control whether fdiv and sqrt operations in offload device code are required to return correctly rounded results. In order to communicate this to the device code, we need the front end to generate IR that reflects the choice.

When the correctly rounded setting is used, we can just generate the fdiv instruction and llvm.sqrt intrinsic, because these operations are required to be correctly rounded by default in LLVM IR.

When the result is not required to be correctly rounded, the front end should generate a call to the llvm.fpbuiltin.fdiv or llvm.fpbuiltin.sqrt intrinsic with the fpbuiltin-max-error attribute set. For single precision fdiv, the setting should be 2.5. For single-precision sqrt, the setting should be 3.0.

If the -ffp-accuracy option is used, we should issue warnings if the settings conflict with an explicitly set -foffload-fp32-prec-div or -foffload-fp32-prec-sqrt option.

to be applied to OpenMP too.

clang/lib/Driver/ToolChains/Clang.cpp

mdtoguchi · 2024-10-29T18:08:06Z

clang/lib/Driver/ToolChains/Clang.cpp

+    if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div))
+      CmdArgs.push_back("-fno-offload-fp32-prec-div");
+    else
+      CmdArgs.push_back("-foffload-fp32-prec-div");


Suggested change

if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div))

CmdArgs.push_back("-fno-offload-fp32-prec-div");

else

CmdArgs.push_back("-foffload-fp32-prec-div");

if (!Args.hasFlag(option::OPT_foffload_fp32_prec_div,

option::OPT_fno_offload_fp32_prec_div, true))

CmdArgs.push_back("-fno-offload-fp32-prec-div");

Since -foffload-fp32-prec-div is default

mdtoguchi · 2024-10-29T18:08:26Z

clang/lib/Driver/ToolChains/Clang.cpp

+    if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_sqrt))
+      CmdArgs.push_back("-fno-offload-fp32-prec-sqrt");
+    else
+      CmdArgs.push_back("-foffload-fp32-prec-sqrt");


similar comment to above.

elizabethandrews

@premanandrao can you review this please?

function instead of adding a JobAction to handle it.

MrSidims

Has to withdraw my review as have 2 questions.

MrSidims · 2024-11-15T16:43:58Z

clang/lib/CodeGen/CGCall.cpp

+          (FuncName == "sqrt" && !getLangOpts().OffloadFP32PrecSqrt &&
+           IsFloat32Type);
+      bool isFP32FdivFunction =
+          (FuncName == "fdiv" && !getLangOpts().OffloadFP32PrecDiv &&


I actually though, that the request is done to replace fdiv instruction with the intrinsic, not fdiv function. Do we know if users actually use such function? I don't see any mentioning of it in SYCL or OpenCL specifications.

@gmlueck could you please comment on that?

The intent of -foffload-fp32-prev-div is to affect the native divide operation (i.e. /). There is no SYCL function named fdiv. Is there a standard C / C++ function with that name?

AFAIK there is no standard function float FP division. There is std::div, but it works only on integers.

There is no C/C++ fdiv function.

MrSidims · 2024-11-15T16:44:52Z

clang/lib/CodeGen/CGCall.cpp

+      bool hasFPAccuracyFuncMap = hasAccuracyRequirement(FuncName);
+      bool hasFPAccuracyVal = !getLangOpts().FPAccuracyVal.empty();
+      bool isFp32SqrtFunction =
+          (FuncName == "sqrt" && !getLangOpts().OffloadFP32PrecSqrt &&


Why do we compare with un-mangled sqrt?

FuncName is the output of FD->getName() which returns a simple identifier. https://github.com/intel/llvm/blob/sycl/clang/include/clang/AST/Decl.h#L280

So clang/test/CodeGenSYCL/offload-fp32-div-sqrt.cpp will pass even with extern "C" removed from sqrt function declaration?

What if the user has a function in their own namespace that happens to be named "sqrt"?

mdtoguchi · 2024-11-15T21:51:51Z

clang/lib/Driver/ToolChains/Clang.cpp

+  bool IsFp32PrecDivSqrtAllowed = JA.isDeviceOffloading(Action::OFK_SYCL) &&
+                                  !JA.isDeviceOffloading(Action::OFK_Cuda) &&
+                                  !JA.isOffloading(Action::OFK_HIP);


As discussed offlne, something like:

Suggested change

bool IsFp32PrecDivSqrtAllowed = JA.isDeviceOffloading(Action::OFK_SYCL) &&

!JA.isDeviceOffloading(Action::OFK_Cuda) &&

!JA.isOffloading(Action::OFK_HIP);

bool IsFp32PrecDivSqrtAllowed = JA.isDeviceOffloading(Action::OFK_SYCL) &&

TC.getTriple().isSPIROrSPIRV();

instruction gets the precision set instead of the fdiv function.

zahiraam · 2024-11-18T20:53:32Z

@MrSidims and @gmlueck I have removed the restriction for CUDA/HIP. My understanding is that @MrSidims will make changes so that the precision for 3.0 is allowed for the sqrt function. Is that the case?
I have also changed the code so that / has precision set with the options instead of fdiv.
Please let me know if these are the changes you expected.

MrSidims · 2024-11-19T19:27:06Z

that @MrSidims will make changes so that the precision for 3.0 is allowed for the sqrt function

In email thread I've replied, that I'm planning to take care of the precise option propagating to CUDA and HIP drivers. I can take a look what should be done for the implementation of non-precise intrinsics.

MrSidims

LGTM assuming that the code doesn't affect C stdlib's div function, see the comment above.

MrSidims · 2024-11-21T20:05:06Z

clang/test/CodeGenSYCL/offload-fp32-div-sqrt.cpp

+      // ROUNDED-SQRT-PREC-DIV: call reassoc nnan ninf nsz arcp afn float @llvm.fpbuiltin.sqrt.f32(float {{.*}}) #[[ATTR_SQRT:[0-9]+]]
+      // ROUNDED-DIV-PREC-SQRT: call reassoc nnan ninf nsz arcp afn spir_func nofpclass(nan inf) float @sqrt(float noundef nofpclass(nan inf) {{.*}})
+      // ROUNDED-DIV-ROUNDED-SQRT-FAST: call reassoc nnan ninf nsz arcp afn float @llvm.fpbuiltin.sqrt.f32(float {{.*}}) #[[ATTR_SQRT:[0-9]+]]
+      // LOW-PREC-DIV: call float @llvm.fpbuiltin.sqrt.f32(float {{.*}}) #[[ATTR_SQRT_LOW:[0-9]+]]


Just want to check if I understand this correctly. In this case we pass: -fno-offload-fp32-prec-div -ffp-builtin-accuracy=high flags. And 1.0 ULP fpbuiltin-error attribute for llvm.fpbuiltin.sqrt.f32 was generated in response of ffp-builtin-accuracy=high flag and we don't expect here precise calculations as high != precise, right?

That's correct.

MrSidims · 2024-11-21T20:08:14Z

clang/lib/CodeGen/CGCall.cpp

-      StringRef FPAccuracyVal = llvm::fp::getAccuracyForFPBuiltin(
-          ID, FuncType, convertFPAccuracy(getLangOpts().FPAccuracyVal));
+      StringRef FPAccuracyVal;
+      if (!getLangOpts().OffloadFP32PrecDiv && Name == "div")


I'm not that familiar with the code, so would like to double check - "div" is just an alias to "llvm.fpbuiltin.fdiv" or is it some function call @div(...). If it's an intrinsic, then why renaming from fdiv to div was required?

Basically I'm asking if these lines may affect C stdlib's div function that have 'long integer' operands. If you say that this code won't affect it - that would be enough explanation for me :)

Yes div is just an alias. You can look at the code in ScalarExprEmitter::EmitDiv at about line #3788 that the intrinsic's ID is set to llvm::Intrinsic::fpbuiltin_fdiv. The call to CreateBuiltinCallWithAttr is made with the hard code name div.
In order to remove this confusion. I will name it back to fdiv as it was. I thought that it would be clearer to rename it div so that we know that this is the / function instead of the fdiv function that responds to these options.

zahiraam · 2024-11-25T16:12:02Z

The FE work has been approved. But this is an attempt to fix the LIT test DeviceLib/cmath_test.cpp. I am not really sure the change is correct. Will revert it LIT fail persists.

zahiraam added 2 commits October 23, 2024 08:38

Add support for -ftarget-prec-div/sqrt options.

f8caf83

Added fast-math run lines to LIT tests.

00ffb5a

zahiraam requested a review from mdtoguchi October 23, 2024 19:11

zahiraam temporarily deployed to WindowsCILock October 23, 2024 19:12 — with GitHub Actions Inactive

zahiraam requested review from jcranmer-intel and gmlueck October 23, 2024 19:12

zahiraam temporarily deployed to WindowsCILock October 23, 2024 20:34 — with GitHub Actions Inactive

Renamed the options accordingly.

795dd38

zahiraam changed the title ~~Add support for -ftarget-prec-div/sqrt options.~~ Add support for -foffload-fp32-prec-div/sqrt options. Oct 24, 2024

zahiraam had a problem deploying to WindowsCILock October 24, 2024 15:09 — with GitHub Actions Error

Fix format.

78a9005

zahiraam temporarily deployed to WindowsCILock October 24, 2024 15:21 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock October 24, 2024 17:29 — with GitHub Actions Inactive

Changed the place where the options are added in order for the options

50e71c0

to be applied to OpenMP too.

zahiraam marked this pull request as ready for review October 28, 2024 17:25

zahiraam requested review from a team as code owners October 28, 2024 17:25

zahiraam temporarily deployed to WindowsCILock October 28, 2024 17:26 — with GitHub Actions Inactive

zahiraam had a problem deploying to WindowsCILock October 28, 2024 19:51 — with GitHub Actions Error

Fix format.

54f2409

zahiraam temporarily deployed to WindowsCILock October 28, 2024 21:34 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock October 29, 2024 00:53 — with GitHub Actions Inactive

zahiraam changed the title ~~Add support for -foffload-fp32-prec-div/sqrt options.~~ [SYCL] Add support for -foffload-fp32-prec-div/sqrt options. Oct 29, 2024

mdtoguchi reviewed Oct 29, 2024

View reviewed changes

clang/lib/Driver/ToolChains/Clang.cpp Outdated Show resolved Hide resolved

mdtoguchi reviewed Oct 29, 2024

View reviewed changes

Addresed review comments.

bdf78d7

zahiraam temporarily deployed to WindowsCILock October 29, 2024 20:24 — with GitHub Actions Inactive

elizabethandrews reviewed Oct 29, 2024

View reviewed changes

zahiraam temporarily deployed to WindowsCILock October 29, 2024 21:42 — with GitHub Actions Inactive

Put the code to handle the options in RenderFloatingPointOptions

8cd6d8b

function instead of adding a JobAction to handle it.

Changed SplitFPAccuracyVal to be a static function instead of a lambda.

b25e5ac

zahiraam temporarily deployed to WindowsCILock November 13, 2024 21:53 — with GitHub Actions Inactive

mdtoguchi approved these changes Nov 13, 2024

View reviewed changes

zahiraam temporarily deployed to WindowsCILock November 13, 2024 23:56 — with GitHub Actions Inactive

premanandrao approved these changes Nov 14, 2024

View reviewed changes

MrSidims requested changes Nov 15, 2024

View reviewed changes

Restricting the use of the options to sycl only.

ce00296

zahiraam temporarily deployed to WindowsCILock November 15, 2024 17:55 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock November 15, 2024 19:14 — with GitHub Actions Inactive

mdtoguchi reviewed Nov 15, 2024

View reviewed changes

MrSidims mentioned this pull request Nov 18, 2024

[SYCL] Pass foffload-fp32-prec-[div/sqrt] options to device's BE #16107

Draft

Remove restriction on Cuda/Hip and changed the code so that the div

bc01759

instruction gets the precision set instead of the fdiv function.

zahiraam temporarily deployed to WindowsCILock November 18, 2024 20:48 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock November 18, 2024 22:03 — with GitHub Actions Inactive

Removed unused lines in CodeGenSYC/offload-fp32-div-sqrt.cpp.

c5fffc5

zahiraam had a problem deploying to WindowsCILock November 21, 2024 16:24 — with GitHub Actions Failure

MrSidims self-requested a review November 21, 2024 17:27

zahiraam temporarily deployed to WindowsCILock November 21, 2024 18:08 — with GitHub Actions Inactive

MrSidims reviewed Nov 21, 2024

View reviewed changes

Renamed div to fdiv to avoid confusion.

f2fb8b2

zahiraam temporarily deployed to WindowsCILock November 22, 2024 13:33 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock November 22, 2024 14:53 — with GitHub Actions Inactive

MrSidims approved these changes Nov 24, 2024

View reviewed changes

This is an attempt to fix the DeviceLib/cmath_test.cpp issue.

83c9b31

zahiraam requested a review from a team as a code owner November 25, 2024 16:10

zahiraam had a problem deploying to WindowsCILock November 25, 2024 16:11 — with GitHub Actions Failure

zahiraam temporarily deployed to WindowsCILock November 25, 2024 17:57 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

zahiraam commented Oct 23, 2024 •

edited

Loading

mdtoguchi Oct 29, 2024

mdtoguchi Oct 29, 2024

elizabethandrews left a comment

MrSidims left a comment

MrSidims Nov 15, 2024

MrSidims Nov 15, 2024

gmlueck Nov 15, 2024

MrSidims Nov 15, 2024

zahiraam Nov 15, 2024

MrSidims Nov 15, 2024

zahiraam Nov 15, 2024

MrSidims Nov 15, 2024 •

edited

Loading

gmlueck Nov 15, 2024

mdtoguchi Nov 15, 2024

zahiraam commented Nov 18, 2024

MrSidims commented Nov 19, 2024

MrSidims left a comment

MrSidims Nov 21, 2024

zahiraam Nov 21, 2024

MrSidims Nov 21, 2024

zahiraam Nov 22, 2024

zahiraam commented Nov 25, 2024

[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

Are you sure you want to change the base?

[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

Conversation

zahiraam commented Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elizabethandrews left a comment

Choose a reason for hiding this comment

MrSidims left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MrSidims Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zahiraam commented Nov 18, 2024

MrSidims commented Nov 19, 2024

MrSidims left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zahiraam commented Nov 25, 2024

zahiraam commented Oct 23, 2024 •

edited

Loading

MrSidims Nov 15, 2024 •

edited

Loading