Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_mm256_storeu_pd and _mm256_loadu_pd using 128 bit lanes #1198

Merged
merged 1 commit into from
Sep 13, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions simde/x86/avx.h
Original file line number Diff line number Diff line change
Expand Up @@ -3784,6 +3784,12 @@ simde__m256d
simde_mm256_loadu_pd (const double a[HEDLEY_ARRAY_PARAM(4)]) {
#if defined(SIMDE_X86_AVX_NATIVE)
return _mm256_loadu_pd(a);
#elif SIMDE_NATURAL_VECTOR_SIZE_LE(128)
simde__m256d_private r_;
for (size_t i = 0 ; i < (sizeof(r_.m128d) / sizeof(r_.m128d[0])) ; i++) {
r_.m128d[i] = simde_mm_loadu_pd(a + 2*i);
}
return simde__m256d_from_private(r_);
#else
simde__m256d r;
simde_memcpy(&r, a, sizeof(r));
Expand Down Expand Up @@ -5272,6 +5278,11 @@ void
simde_mm256_storeu_pd (simde_float64 mem_addr[4], simde__m256d a) {
#if defined(SIMDE_X86_AVX_NATIVE)
_mm256_storeu_pd(mem_addr, a);
#elif SIMDE_NATURAL_VECTOR_SIZE_LE(128)
simde__m256d_private a_ = simde__m256d_to_private(a);
for (size_t i = 0 ; i < (sizeof(a_.m128d) / sizeof(a_.m128d[0])) ; i++) {
simde_mm_storeu_pd(mem_addr + 2*i, a_.m128d[i]);
}
#else
simde_memcpy(mem_addr, &a, sizeof(a));
#endif
Expand Down
Loading