Wrong constant range in simde_vshll_n_XXX intrinsics #1064

M-HT · 2023-09-24T18:22:23Z

In simde_vshll_n_XXX intrinsics the constant range is defined as 1-7 for 8-bit vectors, 1-15 for 16-bit vectors and 1-31 for 32-bit vectors (For example here for simde_vshll_n_s8).

These ranges are not correct because:

In 32-bit arm, the instruction vshll has two encodings. One encoding for shifts 1-7 for 8-bit vectors, 1-15 for 16-bit vectors and 1-31 for 32-bit vectors. And another encoding for shift 8 for 8-bit vectors, 16 for 16-bit vectors a 32 for 32-bit vectors.
In 64-bit arm, there are instructions sshll and ushll which have shifts 0-7 for 8-bit vectors, 0-15 for 16-bit vectors and 0-31 for 32-bit vectors, and there's instruction shll which has shift 8 for 8-bit vectors, 16 for 16-bit vectors a 32 for 32-bit vectors.

Therefore the ranges in simde_vshll_n_XXX intrinsics should be extended at least to 1-8 for 8-bit vectors, 1-16 for 16-bit vectors a 1-32 for 32-bit vectors.

I'm not sure how zero shifts should be handled.

The text was updated successfully, but these errors were encountered:

rosbif · 2023-09-24T20:10:29Z

I generally refer to the following document when writing NEON code:
https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=vshll

From this document:

It seems clear that shifts of 0 should be allowed.
However I can see no mention of shifts of the element size in bits being allowed.
I don't see any difference between A32 and A64.

Maybe you could change the minima from 1 to 0 and add test cases for these zero length shifts.

Remark:
My name appears in the copyright of shll_n.h but I am pretty sure that I didn't write it.
It probably started life as a copy of shl_n.h which I did write.

M-HT · 2023-09-24T22:24:29Z

About zero shifts on A32 - my documentation didn't mention it, but according to this, zero shift is permitted, but resulting instruction is vmovl.

The page also mentions that shifts of the element size are allowed. Similarly for A64 shll instruction allows shifts of the element size. And from my testing, compilers support shifts of the element size when using vshll_n_XXX intrinsics.

rosbif · 2023-09-25T08:55:50Z

I think that SIMDe should follow the documentation of the NEON intrinsics (link above), not the documentation of the underlying assembly instructions.

Some compilers support out of range constants or even variables where strictly a constant is required.
The problem is that other compilers don't.
IMHO SIMDe should enforce strict limits, otherwise code may fail on certain platforms.

So I maintain my position that we should not increase the maxima to the element size in bits but just change the minima from 1 to 0 and add test cases for these zero length shifts.

Remark: the document above states that vmovl generates a SSHLL (or a USHLL) of 0, so vmovl is just a sort of simpler alias for a vshll of zero, not a real instruction.

M-HT · 2023-09-25T13:24:15Z

I found this documentation (which to me looks like official arm documentation) which describes intrinsics also for shifts of the element size.

rosbif · 2023-09-25T14:25:42Z

Oops, sorry: you seem to be right.
I knew about the right shifts which go from one to the element size, but I have never seen these left shifts of the element size in the document I quoted. I must look again.
Apparently instead of generating SSHLL or USHLL instructions, they generate SHLL instructions.

So it looks as if the range should be zero to the element size in bits, and test code must be added for both extremes.

rosbif · 2023-09-25T15:11:55Z

After a search it seems that in the document I cited, a description and pseudo-code corresponding to vshll_n_u16 with n=16 appears but under the incorrect title of vcvt_f32_bf16 (and also under vcvtq_low_f32_bf16).

(There is also the assembly code corresponding to vshll_high_n_u16 with n=16 under the title vcvtq_high_f32_bf16.)

This is not the first time that I have found intrinsics in this document with totally incorrect descriptions and pseudo-code.
It looks as if someone started to add these cases but never finished the job, leaving copy/paste errors .

M-HT · 2023-09-25T15:31:33Z

Actually, vcvt_f32_bf16, vcvtq_low_f32_bf16, vcvtq_high_f32_bf16 are not incorrect, because bfloat16 numbers are the upper 16-bits of 32-bit floating point numbers. Converting bfloat16 to 32-bit floating point means extending from 16-bits to 32-bits and shifting left by 16, which is what shll instruction does.

rosbif · 2023-09-25T15:59:13Z

Thank you for your explanation. I must admit that I was not familiar with the bfloat16 data format.
However I think it would be clearer if the description started by saying "Convert bf16 to f32".

mr-c · 2023-09-26T13:38:10Z

Can someone summarize this for me or open a PR? I'd like to make a new SIMDe release in the next week..

rosbif · 2023-09-26T15:13:04Z

Hi Michael,

I think that we now both agree that for all the SIMDE_REQUIRE_CONSTANT_RANGE in shll_n.h the lower limit should be decreased from one to zero and the upper limit should be increased by one to the element size in bits.

So both
SIMDE_REQUIRE_CONSTANT_RANGE(n, 1, 7)
become
SIMDE_REQUIRE_CONSTANT_RANGE(n, 0, 8)

Both
SIMDE_REQUIRE_CONSTANT_RANGE(n, 1, 15)
become
SIMDE_REQUIRE_CONSTANT_RANGE(n, 0, 16)

Both
SIMDE_REQUIRE_CONSTANT_RANGE(n, 1, 31)
become
SIMDE_REQUIRE_CONSTANT_RANGE(n, 0, 32)

Test cases should be added to shll_n.c to cover the two extremes (n=0 and n=element size in bits).

I could do this but I haven't contributed to SIMDe for nearly three years.
Things have probably changed a lot and maybe M-HT is a better candidate for doing this.

…e#1064)

M-HT · 2023-09-28T18:31:32Z

Fixed in #1068

…e#1064)

M-HT added a commit to M-HT/simde that referenced this issue Sep 26, 2023

Extend constant range in simde_vshll_n_XXX intrinsics (simd-everywher…

fd0e492

…e#1064)

M-HT mentioned this issue Sep 26, 2023

Extend constant range in simde_vshll_n_XXX intrinsics #1068

Merged

mr-c pushed a commit that referenced this issue Sep 28, 2023

Extend constant range in simde_vshll_n_XXX intrinsics (#1064)

beb1c61

M-HT closed this as completed Sep 28, 2023

yyctw pushed a commit to yyctw/simde that referenced this issue Oct 3, 2023

Extend constant range in simde_vshll_n_XXX intrinsics (simd-everywher…

7fe8959

…e#1064)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong constant range in simde_vshll_n_XXX intrinsics #1064

Wrong constant range in simde_vshll_n_XXX intrinsics #1064

M-HT commented Sep 24, 2023

rosbif commented Sep 24, 2023 •

edited

Loading

M-HT commented Sep 24, 2023

rosbif commented Sep 25, 2023

M-HT commented Sep 25, 2023

rosbif commented Sep 25, 2023

rosbif commented Sep 25, 2023

M-HT commented Sep 25, 2023

rosbif commented Sep 25, 2023

mr-c commented Sep 26, 2023

rosbif commented Sep 26, 2023 •

edited

Loading

M-HT commented Sep 28, 2023

Wrong constant range in simde_vshll_n_XXX intrinsics #1064

Wrong constant range in simde_vshll_n_XXX intrinsics #1064

Comments

M-HT commented Sep 24, 2023

rosbif commented Sep 24, 2023 • edited Loading

M-HT commented Sep 24, 2023

rosbif commented Sep 25, 2023

M-HT commented Sep 25, 2023

rosbif commented Sep 25, 2023

rosbif commented Sep 25, 2023

M-HT commented Sep 25, 2023

rosbif commented Sep 25, 2023

mr-c commented Sep 26, 2023

rosbif commented Sep 26, 2023 • edited Loading

M-HT commented Sep 28, 2023

rosbif commented Sep 24, 2023 •

edited

Loading

rosbif commented Sep 26, 2023 •

edited

Loading