Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
NEON: implement all intrinsics supported by architecture A64-remainin…
…g part (#1093) * [NEON] Add 5 intrinsics (vdiv{h/q}_f{16/64}). * [NEON] Add 11 dup_lane series intrinsics. - 2 dup{q}_laneq_f16 - 9 dup{b,h}_lane{q}_{s/u}{8,16}, duph_laneq_f16 * [NEON] Add 8 intrinsics (veor3q{s/u}{8/16/32/64}). * [NEON] Add fmlal, fmlsl, maxnmv, minnmv, pmaxnm, pminnm. * [NEON] Add 12 fmlal series intrinsics. * [NEON] Add 12 fmlsl series intrinsics. * [NEON] Add 11 vmax series intrinsics. - 1 vmaxh_f16 - 3 vmaxnm{/h/q}_f16 - 5 vmaxnmv{q}_f{16/32/64} - 2 vmaxv{q}_f16 * [NEON] Add 11 vmin series intrinsics. - 1 vminh_f16 - 3 vminnm{/h/q}_f16 - 5 vminnmv{q}_f{16/32/64} - 2 vminv{q}_f16 * [NEON] Add 8 vpmax series intrinsics. - 1 vpmaxq_f16 - 7 vpmaxnm{/s/q/qd}_f{16/32/64} * [NEON] Add 9 vpmin series intrinsics. - 2 vpmin{q}_f16 - 7 vpminnm{/s/q/qd}_f{16/32/64} * [NEON] Add 8 intrinsic function families. mmlaq, mull_high_lane, mull_high_n, mulx, mulx_lane, mulx_n, qrdmlah, qmovun_high. * [NEON] Add 3 vmmlaq series intrinsics. * [NEON] Add 41 vmul-related intrinsics. - 8 mull_high_lane series intrinsics - 4 mull_high_n series intrinsics - 9 vmulx series intrinsics - 2 vmulx{q}_n_f16 series intrinsics - 18 vmulx_lane series intrinsics * [NEON] Add 1 vpaddq_f16 intrinsic. * [NEON] Add 3 vqmovun_high_s{16/32/64} intrinsic. * [NEON] Add 6 vqrdmlah series intrinsics. * [NEON] Add 11 series intrinsics. qrdmlah_lane, qrdmlsh, qrdmlsh_lane, qshrun_high_n, rnd32x, rnd32z, rnd64x, rnd64z, rnda, rndx, shll_high_n. * [NEON] Add 30 vqrdmlah, vqrdmlsh related intrinsics. - 12 vqrdmlah{h/s/q}_lane{q}_s{16/32} - 6 vqrdmlsh{h/s/q}_s{16/32} - 12 vqrdmlsh{h/s/q}_lane{q}_s{16/32} * [NEON] Add 2 vqrdmulhh_lane{q}_s16 intrinsics. * [NEON] Add 5 vqsh related intrinsics. - 1 vqshluh_n_s16 - 3 vqshrun_high_n_s{16/32/64} - 1 vqshrun_n_s16 * [NEON] Add 16 vrnd32x, vrnd32z, vrnd64x, vrnd64z related intrinsics. - 4 vrnd32x{q}_f{32/64} - 4 vrnd32z{q}_f{32/64} - 4 vrnd64x{q}_f{32/64} - 4 vrnd64x{q}_f{32/64} * [NEON] Add vrnd{/a/i/m/p/x} related intrinsics. - 3 vrnd{/h/q}_f16 - 3 vrndi{/h/q}_f16 - 3 vrndm{/h/q}_f16 - 3 vrndp{/h/q}_f16 - 7 vrnda{q}_f{16/32/64}, vrndah_f16 - 7 vrndx{q}_f{16/32/64}, vrndxh_f16 * [NEON] Add 6 vshll_high_n series intrinsics. * [NEON] Add 7 intrinsic series. cadd_rot270, cadd_rot90, shrn_high_n, subhn_high, sudot_lane, usdot, usdot_lane * [NEON] Add 2 vcmla{q}_f16 intrinsics * [NEON] Add 6 vshrn_high_n series intrinsics * [NEON] Add 6 vsubhn_high series intrinsics * [NEON] Add 10 vsudot_lane, vusdot, and vusdot_lane series intrinsics. - 4 sudot{q}_lane{q}_s32 - 2 vusdot{q}_s32 - 4 vusdot{q}_lane{q}_s32 * [NEON] Add 10 vadd{q}_rot{90/270}_f{16/32/64} intrinsics. * [NEON] Add 5 series intrinsics. cmla_lane, cmla_rot180_lane, cmla_rot270_lane, cmla_rot90_lane, recpx. * [NEON] Add 38 vcmla related intrinsics. - 8 cvmla{q}_lane{q}_f{16/32} - 2 cvmla{q}_rot90_f16 - 8 cvmla{q}_rot90_lane{q}_f{16/32} - 2 cvmla{q}_rot180_f16 - 8 cvmla{q}_rot180_lane{q}_f{16/32} - 2 cvmla{q}_rot270_f16 - 8 cvmla{q}_rot270_lane{q}_f{16/32} * [NEON] Add vrecpeh_f16 and vrecpsh_f16 intrinsics. * [NEON] Add 3 vrecpx{h,s,d}_f{16,32,64} intrinsics. * [NEON] Add 8 series intrinsics. __crc32, ras, sha1, sha256, sha512, sm3, sm4 * [NEON] Add 8 __crc series intrinsics. * [NEON] Add vrax1q_u64 intrinsic * [NEON] Add sha1, sha256, and sha512 series intrinsics * [NEON] Add sm3 and sm4 series intrinsics * [NEON] Include <arm_acle.h> for __crc32 intrinsics * [NEON] Use uint to simulate the poly type and implement it * [NEON] Add poly type related intrinsics * [NEON] Add ldr and str related intrinsics. Co-authored-by: Eric Yi-Yen Chung <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]>
- Loading branch information