Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEON: implement all intrinsics supported by architecture A64 #1080

Closed
wants to merge 136 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
136 commits
Select commit Hold shift + click to select a range
5dfbba5
[NEON] Add vabal_{s/u}{8/16/32}
yyctw Jun 21, 2023
a3df01e
[NEON] Add vabal_high_{s/u}{8/16/32}
yyctw Jun 21, 2023
ae83277
[NEON] Add all vcale* intrinsics (9)
yyctw Jun 21, 2023
b09cb4e
[NEON] Add all vcalt intrinsics (9)
yyctw Jun 21, 2023
7c6698f
[NEON] Add vcreate_f16
yyctw Jun 21, 2023
7aa70da
[NEON] Add vreinterpret_u64_f16
yyctw Jun 21, 2023
b280b32
[NEON] Add vcvth_f16_s16 and vcvth_f16_u16
yyctw Jul 7, 2023
1302ae6
[NEON] Add vduph_lane_f16, vdup_lane_f16, and vdupq_lane_f16
yyctw Jul 7, 2023
1e1bc92
[NEON] Add vext_f16
yyctw Jul 7, 2023
9991daf
[NEON] Add 16 vcvt{q}_n_* intrinsics
yyctw Jul 7, 2023
b950713
[Fix] Correct function input parameters
yyctw Jul 10, 2023
4d5bb31
[NEON] Add 6 vcvtn_{s/u}{16/32/64}_f{*} intrinsics
yyctw Jul 10, 2023
554ee94
[Fix] Correct vdup_lane_f16 and vdupq_lane_f16.
yyctw Jul 10, 2023
fb84e0b
[Fix] Correct function input parameters.
yyctw Jul 10, 2023
9c8e9dd
[NEON] Add 24 vcvt{q}_n_* intrinsics
yyctw Jul 11, 2023
bb894a8
[NEON] Add all vcvtn* intrinsics
yyctw Jul 17, 2023
42fb1d8
[NEON] Add vfmah_f16 and vfma_f16
yyctw Jul 18, 2023
af40025
[NEON] Add vfma_n_f16 and vfmaq_n_f16
yyctw Jul 18, 2023
63a353b
[NEON] Add vmulh_f16
yyctw Jul 18, 2023
dcf8dba
[NEON] Add fma_lane related intrinsics.
yyctw Jul 18, 2023
a57b8c5
[NEON] Add 5 vmul* related intrinsics
yyctw Jul 18, 2023
ef86ae3
[NEON] Add neg related intrinsics.
yyctw Jul 18, 2023
12bbcb4
[NEON] Add all fms, fms_n, and fms_lane intrinsics
yyctw Jul 19, 2023
982c243
[NEON] Add types float16x{4/8}x{2/3/4}
yyctw Jul 19, 2023
bfc28b3
[NEON] Add 9 vld1 related intrinsics
yyctw Jul 19, 2023
07755ff
[Fix] Modified wrong rounding implementation.
yyctw Jul 20, 2023
70e54a5
[Fix] Fix wrong intrinsic alias names.
yyctw Jul 20, 2023
2e2440c
[Refactor] Remove redundant functions.
yyctw Jul 25, 2023
0c8fd27
[NEON] Add 45 ld2 related intrinsics
yyctw Jul 26, 2023
1783495
[NEON] Add ld3_dup, ld3_lane, and ld4_dup
yyctw Jul 28, 2023
6301094
[NEON] Add vld3_f16 and vld4_f16.
yyctw Jul 28, 2023
8511d5d
[NEON] Add vld{3/4}_{dup/lane} series intrinsics
yyctw Jul 28, 2023
a6edb48
[NEON] Add mla_{high}_lane series intrinsics
yyctw Aug 1, 2023
3fbfe54
[NEON] Add qdmlal_{high}_{lane} series intrinsics.
yyctw Aug 1, 2023
30887f4
[NEON] Add qdmlal_lane and qdmlal_n series intrinsics
yyctw Aug 1, 2023
51d95f8
[NEON] Add mls_lane and mlsl_high_lane series intrinsics
yyctw Aug 2, 2023
b940057
[NEON] Add 22 qdmlsl series intrinsics
yyctw Aug 2, 2023
2c2b3d4
[NEON] Add 10 qdmull_* series intrinsics
yyctw Aug 2, 2023
f0822cd
[NEON] Add 3 qdmulh series intrinsics
yyctw Aug 2, 2023
e5bb44a
[Fix] Fix wrong function name.
yyctw Aug 2, 2023
3413662
[Fix] Correct the wrong alias function name.
yyctw Aug 4, 2023
6ed554e
[NEON] Add qdmullh_lane{q}_s{16/32} related intrinsics
yyctw Aug 4, 2023
2500060
[NEON] Add qdmull_n and qdmull_high_lane series intrinsics
yyctw Aug 4, 2023
ed76fb8
[Fix] Add conditions for fp16 intrinsics
yyctw Oct 11, 2023
9a225f3
[NEON] Add qshrnh_n_{s/u}16.
yyctw Aug 8, 2023
df49b6f
[NEON] Add qrshr{u}n_high_n, qshrn_high_n, and rshrn_high_n.
yyctw Aug 8, 2023
1feba62
[NEON] Add qrshrn_high_n_{s/u}{16/32/64}.
yyctw Aug 8, 2023
ba392a7
[NEON] Add qrshrun_high_n_s{16/32/64}.
yyctw Aug 8, 2023
05a8490
[NEON] Add qshrn_high_n_{s/u}{16/32/64}.
yyctw Aug 8, 2023
4c9831b
[NEON] Add rshrn_high_n_{s/u}{16/32/64}.
yyctw Aug 8, 2023
61a2d50
[NEON] Add r{add/sub}hn_{high} and reinterpret series.
yyctw Aug 9, 2023
1be7374
[NEON] Add raddhn_{high} series.
yyctw Aug 9, 2023
70afc3d
[NEON] Add rsubhn_{high} series.
yyctw Aug 9, 2023
fffba2c
[NEON] Completed reinterpret series.
yyctw Aug 9, 2023
4e676dc
[NEON] Add sli_n, st1_x2, and st1q_x2.
yyctw Aug 10, 2023
60790f6
[NEON] Add 18 sli{q}_n_{s/u}{8/16/32/64} intrinsics.
yyctw Aug 10, 2023
6ffe211
[NEON] Add 22 st1{q}_{TYPE}_x2 series intrinsic.
yyctw Aug 10, 2023
c36532d
[NEON] Add st1{q}_x3
yyctw Aug 14, 2023
bd83279
[NEON] Add st1{q}_x3 series.
yyctw Aug 14, 2023
2df0557
[NEON] Add qrshl, st1_x4, and st1q_x4 series.
yyctw Aug 14, 2023
c04bda6
[NEON] Add 22 intrinsics (st1{q}_{TYPE}_x4 series).
yyctw Aug 14, 2023
77a90a0
[NEON] Add 4 intrinsics (st{3/4}{q}_f16).
yyctw Aug 14, 2023
233f487
[NEON] Add 8 intrinsics (vst{1/2/3/4}{q}_lane_f16).
yyctw Aug 14, 2023
b838c40
[NEON] Add vld{3/4}q_f16 intrinsics
yyctw Aug 14, 2023
92d8862
[NEON] Add vtrn{/1/2}{q}_f16 intrinsics
yyctw Aug 14, 2023
978e5e2
[NEON] Add 4 intrinsics (vuzp{q}_f16 and vzup{1/2}q_f16)
yyctw Aug 14, 2023
75f7656
[NEON] Add 2 intrinsics (vrev64{q}_f16).
yyctw Aug 14, 2023
8d82934
[NEON] Add 24 intrinsics (vqrshl{q}_{TYPE}, vqrshl{b/h/s/d}_{TYPE}).
yyctw Aug 15, 2023
8fc43bc
[NEON] Add abdl_high, addhn_high, and qshl_n.
yyctw Aug 16, 2023
edd6b7f
[NEON] Add 24 intrinsics (vqshl{q}_n series).
yyctw Aug 16, 2023
b4ca90e
[NEON] Add 6 intrinsics (vabdl_high_{s/u} series).
yyctw Aug 16, 2023
ddd149f
[NEON] Add 6 intrinsics (vaddhn_high_{s/u} series).
yyctw Aug 16, 2023
f5e86ff
[NEON] Add 4 intrinsics (3 vabd{h//q}_f16 and 1 vabsh_f16).
yyctw Aug 16, 2023
fefe4e5
[NEON] Add 3 intrinsics (vcgez{/h/q}_f16).
yyctw Aug 16, 2023
1d9a2cc
[NEON] Add 3 intrinsics (vcgtz{/h/q}_f16).
yyctw Aug 16, 2023
9478ff1
[NEON] Add 3 intrinsics (vcle{/h/q}_f16).
yyctw Aug 16, 2023
ab98cf9
[NEON] Add 3 intrinsics (vcltz{/h/q}_f16).
yyctw Aug 16, 2023
ba822b4
[NEON] Add cvtm, cvtp, and copy_lane.
yyctw Aug 23, 2023
26684b9
[NEON] Add 40 intrinsics (vcopy{q}_lane{q}_{TYPE}).
yyctw Aug 23, 2023
01badc3
[NEON] Add 33 vcvt series intrinsics.
yyctw Aug 23, 2023
75f2def
[NEON] Add 20 vcvt{h/s/d}_n_{TYPE} series intrinsics.
yyctw Aug 23, 2023
2111542
[NEON] Add 22 intrinsics (vcvtm_{TYPE}}).
yyctw Aug 23, 2023
42fe3aa
[NEON] Add 22 intrinsics (vcvtp_{TYPE}}).
yyctw Aug 23, 2023
0c13fbe
[NEON] Add 5 intrinsics (vdiv{h/q}_f{16/64}).
yyctw Aug 23, 2023
b4f0136
[NEON] Add 11 dup_lane series intrinsics.
yyctw Aug 23, 2023
1f13800
[NEON] Add 8 intrinsics (veor3q{s/u}{8/16/32/64}).
yyctw Aug 23, 2023
b0d4ff2
[NEON] Add fmlal, fmlsl, maxnmv, minnmv, pmaxnm, pminnm.
yyctw Aug 23, 2023
6e41612
[NEON] Add 12 fmlal series intrinsics.
yyctw Aug 23, 2023
fbb2715
[NEON] Add 12 fmlsl series intrinsics.
yyctw Aug 23, 2023
1fea4a5
[NEON] Add 11 vmax series intrinsics.
yyctw Aug 23, 2023
c202f42
[NEON] Add 11 vmin series intrinsics.
yyctw Aug 23, 2023
4319b46
[NEON] Add 8 vpmax series intrinsics.
yyctw Aug 23, 2023
2e05a40
[NEON] Add 9 vpmin series intrinsics.
yyctw Aug 23, 2023
8c5fd5d
[NEON] Add 8 intrinsic function families.
yyctw Aug 28, 2023
4d06335
[NEON] Add 3 vmmlaq series intrinsics.
yyctw Aug 28, 2023
ca0e49d
[NEON] Add 41 vmul-related intrinsics.
yyctw Aug 28, 2023
4cbe097
[NEON] Add 1 vpaddq_f16 intrinsic.
yyctw Aug 28, 2023
f288395
[NEON] Add 3 vqmovun_high_s{16/32/64} intrinsic.
yyctw Aug 28, 2023
11153c6
[NEON] Add 6 vqrdmlah series intrinsics.
yyctw Aug 28, 2023
6a311d4
[NEON] Add 11 series intrinsics.
yyctw Aug 30, 2023
6360dd2
[Fix] Delete redundant code.
yyctw Aug 30, 2023
f855470
[NEON] Add 30 vqrdmlah, vqrdmlsh related intrinsics.
yyctw Aug 30, 2023
ca12892
[NEON] Add 2 vqrdmulhh_lane{q}_s16 intrinsics.
yyctw Aug 30, 2023
f8e069f
[Fix] Add the missing test cases.
yyctw Oct 13, 2023
3b481f1
[Fix] Fix bugs and missing conditions.
yyctw Oct 13, 2023
f142f2c
[NEON] Add 5 vqsh related intrinsics.
yyctw Aug 30, 2023
e744b06
[NEON] Add 16 vrnd32x, vrnd32z, vrnd64x, vrnd64z related intrinsics.
yyctw Aug 30, 2023
6e7b0e7
[NEON] Add vrnd{/a/i/m/p/x} related intrinsics.
yyctw Aug 30, 2023
016a87d
[NEON] Add 6 vshll_high_n series intrinsics.
yyctw Aug 30, 2023
ba278d2
[NEON] Add 7 intrinsic series.
yyctw Aug 31, 2023
b79afff
[NEON] Add 2 vcmla{q}_f16 intrinsics
yyctw Aug 31, 2023
01048c9
[NEON] Add 6 vshrn_high_n series intrinsics
yyctw Aug 31, 2023
97de482
[NEON] Add 6 vsubhn_high series intrinsics
yyctw Aug 31, 2023
df37047
[NEON] Add 10 vsudot_lane, vusdot, and vusdot_lane series intrinsics.
yyctw Aug 31, 2023
2bba27d
[NEON] Add 10 vadd{q}_rot{90/270}_f{16/32/64} intrinsics.
yyctw Aug 31, 2023
e71572e
[NEON] Add 5 series intrinsics.
yyctw Sep 4, 2023
185c174
[NEON] Add 38 vcmla related intrinsics.
yyctw Sep 4, 2023
6156dd3
[NEON] Add vrecpeh_f16 and vrecpsh_f16 intrinsics.
yyctw Sep 4, 2023
eaa71ce
[NEON] Add 3 vrecpx{h,s,d}_f{16,32,64} intrinsics.
yyctw Sep 4, 2023
9aa3ed4
[NEON] Add 8 series intrinsics.
yyctw Sep 11, 2023
b81cf82
[NEON] Add 8 __crc series intrinsics.
yyctw Sep 11, 2023
8c1dad6
[NEON] Add 4 aes series intrinsics.
yyctw Sep 11, 2023
fe70324
[NEON] Add vrax1q_u64 intrinsic
yyctw Sep 11, 2023
c23cc2b
[NEON] Add sha1, sha256, and sha512 series intrinsics
yyctw Sep 11, 2023
5d3d6c8
[NEON] Add sm3 and sm4 series intrinsics
yyctw Sep 11, 2023
9037921
[NEON] Include <arm_acle.h> for __crc32 intrinsics
yyctw Sep 11, 2023
311dc45
[NEON] Pre-add intrinsics with data type bf16.
yyctw Sep 12, 2023
33e417b
[NEON] Use uint to simulate the poly type and implement it
yyctw Sep 13, 2023
353109d
[NEON] Add poly type related intrinsics
yyctw Sep 13, 2023
32f724a
[NEON] Add poly type intrinsics in the st series
yyctw Sep 13, 2023
454d2e6
[NEON] Add poly type intrinsics in the add series
yyctw Sep 13, 2023
f12149a
[NEON] Add poly type intrinsics in the dup series
yyctw Sep 13, 2023
f61a755
[NEON] Delete redundant definition
yyctw Sep 18, 2023
d64f58e
[NEON] Add ldr and str related intrinsics.
yyctw Sep 18, 2023
4f4eff5
[NEON] Add all poly-type related intrinsics.
yyctw Sep 18, 2023
148643e
[Fix] Add copyright
yyctw Oct 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
104 changes: 103 additions & 1 deletion meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -9,24 +9,34 @@ cc = meson.get_compiler('c')
cxx = meson.get_compiler('cpp')

simde_neon_families = [
'__crc32',
'aba',
'abal',
'abal_high',
'abd',
'abdl',
'abdl_high',
'abs',
'add',
'addhn',
'addhn_high',
'addl',
'addlv',
'addl_high',
'addv',
'addw',
'addw_high',
'aes',
'and',
'bcax',
'bic',
'bsl',
'cadd_rot270',
'cadd_rot90',
'cage',
'cagt',
'cale',
'calt',
'ceq',
'ceqz',
'cge',
Expand All @@ -40,13 +50,21 @@ simde_neon_families = [
'cltz',
'clz',
'cmla',
'cmla_rot90',
'cmla_lane',
'cmla_rot180',
'cmla_rot180_lane',
'cmla_rot270',
'cmla_rot270_lane',
'cmla_rot90',
'cmla_rot90_lane',
'cnt',
'cvt',
'cvt_n',
'cvtm',
'cvtn',
'cvtp',
'combine',
'copy_lane',
'create',
'div',
'dot',
Expand All @@ -58,6 +76,11 @@ simde_neon_families = [
'fma',
'fma_lane',
'fma_n',
'fmlal',
'fmlsl',
'fms',
'fms_lane',
'fms_n',
'get_high',
'get_lane',
'get_low',
Expand All @@ -73,30 +96,42 @@ simde_neon_families = [
'ld1q_x4',
'ld1',
'ld2',
'ld2_dup',
'ld2_lane',
'ld3',
'ld3_dup',
'ld3_lane',
'ld4',
'ld4_dup',
'ld4_lane',
'ldr',
'max',
'maxnm',
'maxnmv',
'maxv',
'min',
'minnm',
'minnmv',
'minv',
'mla',
'mla_lane',
'mla_n',
'mlal',
'mlal_high',
'mlal_high_lane',
'mlal_high_n',
'mlal_lane',
'mlal_n',
'mls',
'mls_lane',
'mls_n',
'mlsl',
'mlsl_high',
'mlsl_high_lane',
'mlsl_high_n',
'mlsl_lane',
'mlsl_n',
'mmlaq',
'movl',
'movl_high',
'movn',
Expand All @@ -106,8 +141,13 @@ simde_neon_families = [
'mul_n',
'mull',
'mull_high',
'mull_high_lane',
'mull_high_n',
'mull_lane',
'mull_n',
'mulx',
'mulx_lane',
'mulx_n',
'mvn',
'neg',
'orn',
Expand All @@ -116,79 +156,141 @@ simde_neon_families = [
'padd',
'paddl',
'pmax',
'pmaxnm',
'pmin',
'pminnm',
'qadd',
'qabs',
'qdmlal',
'qdmlal_high',
'qdmlal_high_lane',
'qdmlal_high_n',
'qdmlal_lane',
'qdmlal_n',
'qdmlsl',
'qdmlsl_high',
'qdmlsl_high_lane',
'qdmlsl_high_n',
'qdmlsl_lane',
'qdmlsl_n',
'qdmulh',
'qdmulh_lane',
'qdmulh_n',
'qdmull',
'qdmull_high',
'qdmull_high_lane',
'qdmull_high_n',
'qdmull_lane',
'qdmull_n',
'qrdmlah',
'qrdmlah_lane',
'qrdmlsh',
'qrdmlsh_lane',
'qrdmulh',
'qrdmulh_lane',
'qrdmulh_n',
'qrshl',
'qrshrn_high_n',
'qrshrn_n',
'qrshrun_high_n',
'qrshrun_n',
'qmovn',
'qmovn_high',
'qmovun',
'qmovun_high',
'qneg',
'qshl',
'qshl_n',
'qshlu_n',
'qshrn_high_n',
'qshrn_n',
'qshrun_high_n',
'qshrun_n',
'qsub',
'qtbl',
'qtbx',
'raddhn',
'raddhn_high',
'rax',
'rbit',
'recpe',
'recps',
'recpx',
'reinterpret',
'rev16',
'rev32',
'rev64',
'rhadd',
'rnd',
'rnd32x',
'rnd32z',
'rnd64x',
'rnd64z',
'rnda',
'rndi',
'rndm',
'rndn',
'rndp',
'rndx',
'rshl',
'rshr_n',
'rshrn_high_n',
'rshrn_n',
'rsqrte',
'rsqrts',
'rsra_n',
'rsubhn',
'rsubhn_high',
'set_lane',
'sha1',
'sha256',
'sha512',
'shl',
'shl_n',
'shll_high_n',
'shll_n',
'shr_n',
'shrn_high_n',
'shrn_n',
'sli_n',
'sm3',
'sm4',
'sqadd',
'sqrt',
'sra_n',
'sri_n',
'st1',
'st1_lane',
'st1_x2',
'st1_x3',
'st1_x4',
'st1q_x2',
'st1q_x3',
'st1q_x4',
'st2',
'st2_lane',
'st3',
'st3_lane',
'st4',
'st4_lane',
'str',
'sub',
'subhn',
'subhn_high',
'subl',
'subl_high',
'subw',
'subw_high',
'sudot_lane',
'tbl',
'tbx',
'trn1',
'trn2',
'trn',
'tst',
'uqadd',
'usdot',
'usdot_lane',
'uzp1',
'uzp2',
'uzp',
Expand Down
Loading
Loading