You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently got reports of my software crashing with an invalid instruction exception inside the tract-linalg, inside fma_mmm_f32_24x4_0_20_7 function, specifically when executing vpcmpeqd ymm15, ymm15, ymm15here.
From my understanding, tract-linalg currently checks if the CPUID has the FMA feature enabled before using this function. However, the vpcmpeqd with ymm arguments requires the AVX2 feature. Similarly, vgatherdps is also an AVX2 feature, according to this website.
You would think that FMA is a superset of the AVX2 feature, so all AVX2 instructions would be available if FMA is enabled, but that is wrong. There are some CPUs that have FMA but no AVX (in particular, some old AMD CPUs from 2012 to 2015 fall in this category). I believe the reports I am receiving are coming from users on those CPUs.
I think tract-linalg should either avoid those two instructions in the FMA kernels, or simply amend its test here to check both for the avx2 and fma feature.
The text was updated successfully, but these errors were encountered:
I'm not really in a position to test this on actual machines... and I could not make qemu emulates them either in a reasonanle amount of time. So I hope that will fix it.
I recently got reports of my software crashing with an invalid instruction exception inside the tract-linalg, inside
fma_mmm_f32_24x4_0_20_7
function, specifically when executingvpcmpeqd ymm15, ymm15, ymm15
here.From my understanding, tract-linalg currently checks if the CPUID has the FMA feature enabled before using this function. However, the
vpcmpeqd
withymm
arguments requires the AVX2 feature. Similarly,vgatherdps
is also an AVX2 feature, according to this website.You would think that FMA is a superset of the AVX2 feature, so all AVX2 instructions would be available if FMA is enabled, but that is wrong. There are some CPUs that have FMA but no AVX (in particular, some old AMD CPUs from 2012 to 2015 fall in this category). I believe the reports I am receiving are coming from users on those CPUs.
I think tract-linalg should either avoid those two instructions in the FMA kernels, or simply amend its test here to check both for the
avx2
andfma
feature.The text was updated successfully, but these errors were encountered: