-
Notifications
You must be signed in to change notification settings - Fork 259
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
NEON: implement all bf16-related intrinsics (#1110)
* [Feat] Add BF16 when the machine is supported. Finished: vld1_bf16_x4 and vld1q_bf16_x2 * [NEON] Add a C implementation of the bf16 type * [NEON] Add all ld_*_bf16 intrinsics. * [NEON] Add all st*_bf16 intrinsics. * [Test] Add vbfdot_f32 test case * [NEON] Complete converting function from float32 to bfloat16. - Also add bf-related functions in three series - cvt, dot, dot_lane * [Feat] Add option '+bf16' in cross-file * [NEON] Completed initial implementation of bf-16 related intrinsics. * [Fix] Remove redundant commment * [Fix] Correct native aliases * [Fix] The test generation code has been completed.
- Loading branch information
Showing
101 changed files
with
11,220 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,8 +26,6 @@ | |
* 2023 Yi-Yen Chung <[email protected]> (Copyright owned by Andes Technology) | ||
*/ | ||
|
||
/* Yi-Yen Chung: Added vcreate_f16 */ | ||
|
||
#if !defined(SIMDE_ARM_NEON_CREATE_H) | ||
#define SIMDE_ARM_NEON_CREATE_H | ||
|
||
|
@@ -235,6 +233,19 @@ simde_vcreate_p64(simde_poly64_t a) { | |
#define vcreate_p64(a) simde_vcreate_p64(a) | ||
#endif | ||
|
||
SIMDE_FUNCTION_ATTRIBUTES | ||
simde_bfloat16x4_t | ||
simde_vcreate_bf16(uint64_t a) { | ||
#if defined(SIMDE_ARM_NEON_A32V8_NATIVE) && defined(SIMDE_ARM_NEON_BF16) | ||
return vcreate_bf16(a); | ||
#else | ||
return simde_vreinterpret_bf16_u64(simde_vdup_n_u64(a)); | ||
#endif | ||
} | ||
#if defined(SIMDE_ARM_NEON_A32V8_ENABLE_NATIVE_ALIASES) | ||
#undef vcreate_bf16 | ||
#define vcreate_bf16(a) simde_vcreate_bf16(a) | ||
#endif | ||
|
||
SIMDE_END_DECLS_ | ||
HEDLEY_DIAGNOSTIC_POP | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.