Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tract-linalg: Invalid Instruction in fma_mmm_f32_24x4_0_20_7 #1267

Closed
roblabla opened this issue Nov 23, 2023 · 2 comments · Fixed by #1274
Closed

tract-linalg: Invalid Instruction in fma_mmm_f32_24x4_0_20_7 #1267

roblabla opened this issue Nov 23, 2023 · 2 comments · Fixed by #1274

Comments

@roblabla
Copy link

roblabla commented Nov 23, 2023

I recently got reports of my software crashing with an invalid instruction exception inside the tract-linalg, inside fma_mmm_f32_24x4_0_20_7 function, specifically when executing vpcmpeqd ymm15, ymm15, ymm15 here.

From my understanding, tract-linalg currently checks if the CPUID has the FMA feature enabled before using this function. However, the vpcmpeqd with ymm arguments requires the AVX2 feature. Similarly, vgatherdps is also an AVX2 feature, according to this website.

You would think that FMA is a superset of the AVX2 feature, so all AVX2 instructions would be available if FMA is enabled, but that is wrong. There are some CPUs that have FMA but no AVX (in particular, some old AMD CPUs from 2012 to 2015 fall in this category). I believe the reports I am receiving are coming from users on those CPUs.

I think tract-linalg should either avoid those two instructions in the FMA kernels, or simply amend its test here to check both for the avx2 and fma feature.

@kali
Copy link
Collaborator

kali commented Nov 23, 2023

Wow, I was sure FMA was a superset. Good catch. I'll have a look, see if I can reasonable make them work without these two instructions.

@kali kali linked a pull request Nov 30, 2023 that will close this issue
@kali
Copy link
Collaborator

kali commented Nov 30, 2023

I'm not really in a position to test this on actual machines... and I could not make qemu emulates them either in a reasonanle amount of time. So I hope that will fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants