Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong constant range in simde_vshll_n_XXX intrinsics #1064

Closed
M-HT opened this issue Sep 24, 2023 · 11 comments
Closed

Wrong constant range in simde_vshll_n_XXX intrinsics #1064

M-HT opened this issue Sep 24, 2023 · 11 comments

Comments

@M-HT
Copy link
Contributor

M-HT commented Sep 24, 2023

In simde_vshll_n_XXX intrinsics the constant range is defined as 1-7 for 8-bit vectors, 1-15 for 16-bit vectors and 1-31 for 32-bit vectors (For example here for simde_vshll_n_s8).

These ranges are not correct because:

  • In 32-bit arm, the instruction vshll has two encodings. One encoding for shifts 1-7 for 8-bit vectors, 1-15 for 16-bit vectors and 1-31 for 32-bit vectors. And another encoding for shift 8 for 8-bit vectors, 16 for 16-bit vectors a 32 for 32-bit vectors.
  • In 64-bit arm, there are instructions sshll and ushll which have shifts 0-7 for 8-bit vectors, 0-15 for 16-bit vectors and 0-31 for 32-bit vectors, and there's instruction shll which has shift 8 for 8-bit vectors, 16 for 16-bit vectors a 32 for 32-bit vectors.

Therefore the ranges in simde_vshll_n_XXX intrinsics should be extended at least to 1-8 for 8-bit vectors, 1-16 for 16-bit vectors a 1-32 for 32-bit vectors.

I'm not sure how zero shifts should be handled.

@rosbif
Copy link
Collaborator

rosbif commented Sep 24, 2023

I generally refer to the following document when writing NEON code:
https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=vshll

From this document:

  • It seems clear that shifts of 0 should be allowed.
  • However I can see no mention of shifts of the element size in bits being allowed.
  • I don't see any difference between A32 and A64.

Maybe you could change the minima from 1 to 0 and add test cases for these zero length shifts.

Remark:
My name appears in the copyright of shll_n.h but I am pretty sure that I didn't write it.
It probably started life as a copy of shl_n.h which I did write.

@M-HT
Copy link
Contributor Author

M-HT commented Sep 24, 2023

About zero shifts on A32 - my documentation didn't mention it, but according to this, zero shift is permitted, but resulting instruction is vmovl.

The page also mentions that shifts of the element size are allowed. Similarly for A64 shll instruction allows shifts of the element size. And from my testing, compilers support shifts of the element size when using vshll_n_XXX intrinsics.

@rosbif
Copy link
Collaborator

rosbif commented Sep 25, 2023

I think that SIMDe should follow the documentation of the NEON intrinsics (link above), not the documentation of the underlying assembly instructions.

Some compilers support out of range constants or even variables where strictly a constant is required.
The problem is that other compilers don't.
IMHO SIMDe should enforce strict limits, otherwise code may fail on certain platforms.

So I maintain my position that we should not increase the maxima to the element size in bits but just change the minima from 1 to 0 and add test cases for these zero length shifts.

Remark: the document above states that vmovl generates a SSHLL (or a USHLL) of 0, so vmovl is just a sort of simpler alias for a vshll of zero, not a real instruction.

@M-HT
Copy link
Contributor Author

M-HT commented Sep 25, 2023

I found this documentation (which to me looks like official arm documentation) which describes intrinsics also for shifts of the element size.

@rosbif
Copy link
Collaborator

rosbif commented Sep 25, 2023

Oops, sorry: you seem to be right.
I knew about the right shifts which go from one to the element size, but I have never seen these left shifts of the element size in the document I quoted. I must look again.
Apparently instead of generating SSHLL or USHLL instructions, they generate SHLL instructions.

So it looks as if the range should be zero to the element size in bits, and test code must be added for both extremes.

@rosbif
Copy link
Collaborator

rosbif commented Sep 25, 2023

After a search it seems that in the document I cited, a description and pseudo-code corresponding to vshll_n_u16 with n=16 appears but under the incorrect title of vcvt_f32_bf16 (and also under vcvtq_low_f32_bf16).

(There is also the assembly code corresponding to vshll_high_n_u16 with n=16 under the title vcvtq_high_f32_bf16.)

This is not the first time that I have found intrinsics in this document with totally incorrect descriptions and pseudo-code.
It looks as if someone started to add these cases but never finished the job, leaving copy/paste errors .

@M-HT
Copy link
Contributor Author

M-HT commented Sep 25, 2023

Actually, vcvt_f32_bf16, vcvtq_low_f32_bf16, vcvtq_high_f32_bf16 are not incorrect, because bfloat16 numbers are the upper 16-bits of 32-bit floating point numbers. Converting bfloat16 to 32-bit floating point means extending from 16-bits to 32-bits and shifting left by 16, which is what shll instruction does.

@rosbif
Copy link
Collaborator

rosbif commented Sep 25, 2023

Thank you for your explanation. I must admit that I was not familiar with the bfloat16 data format.
However I think it would be clearer if the description started by saying "Convert bf16 to f32".

@mr-c
Copy link
Collaborator

mr-c commented Sep 26, 2023

Can someone summarize this for me or open a PR? I'd like to make a new SIMDe release in the next week..

@rosbif
Copy link
Collaborator

rosbif commented Sep 26, 2023

Hi Michael,

I think that we now both agree that for all the SIMDE_REQUIRE_CONSTANT_RANGE in shll_n.h the lower limit should be decreased from one to zero and the upper limit should be increased by one to the element size in bits.

So both
SIMDE_REQUIRE_CONSTANT_RANGE(n, 1, 7)
become
SIMDE_REQUIRE_CONSTANT_RANGE(n, 0, 8)

Both
SIMDE_REQUIRE_CONSTANT_RANGE(n, 1, 15)
become
SIMDE_REQUIRE_CONSTANT_RANGE(n, 0, 16)

Both
SIMDE_REQUIRE_CONSTANT_RANGE(n, 1, 31)
become
SIMDE_REQUIRE_CONSTANT_RANGE(n, 0, 32)

Test cases should be added to shll_n.c to cover the two extremes (n=0 and n=element size in bits).

I could do this but I haven't contributed to SIMDe for nearly three years.
Things have probably changed a lot and maybe M-HT is a better candidate for doing this.

@M-HT
Copy link
Contributor Author

M-HT commented Sep 28, 2023

Fixed in #1068

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants