-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precompute expensive computations #1911
Conversation
Replace hot FDIVs with FMULs through saving off the inverse values outside the loops, so we don't incur the cost on each loop iteration.
Thanks, but I don't observe these improvements running interactively on my laptop. I'm travelling, so will try again more rigorously on a PC when I get home Which of the |
Thank you for accepting the patch. Performance will vary based each microarchitecture's relative difference in div/mul latencies. I was measuring on an Ampere Altra aarch64. The input file was |
I have a formatting fix. How do I offer it here? |
Thanks, but don't worry. I will rebase your original PR into a new branch of |
Hmm... could you create a PR from heshpdx@1aad204 to https://github.com/ERGO-Code/HiGHS/tree/consider-1911 please, as I seem to be getting some merge conflicts if I try to do it |
Ok I made this. #1914 |
I ran on another ARM server today at the office. Data below is from a Huawei Taishan (Kunpeng 920 CPU) using gcc-11.4. We see +6.5% on master, both cmdlines:
branch, both cmdlines:
|
Today I got access to an AMD server in the OCI cloud, /proc/cpuinfo says this is a "AMD EPYC 7J13 64-Core Processor." I used taskset to make sure we land on the same core, since each core in a many-core machine has a different latency to memory. We see +2.1% on master, both cmdlines:
branch, both cmdlines:
|
Thanks. I've just run these two models on my desktop, and get a small improvement. I'll run through the Mittelmann test set to extend the sample |
@jajhall I could run the MIPLIB etc., if you like. |
Thanks, but I think that the LP test set is enough as a sanity check that noting is broken. That said, MIP behaviour will change, so you'll need a reference set of results for future comparison I'm not checking for performance as a means of deciding whether to make the change, as it's hard to imagine it ever being worse. I'm just intrigued to see what difference it makes. |
Analysis showed some easy opportunities to gain performance through replacing expensive FP divides with FP multiplies. We can loft the computation outside the hot loops and hold inverse values instead. This way we don't incur the cost of the divide on every loop iteration. Most CPUs are much slower on FP divides compared to FP multiplies.
This technique already exists in the codebase, for example:
https://github.com/ERGO-Code/HiGHS/blob/5ce7a27/src/util/HFactor.cpp#L726
I measured +4.5% performance with this patch, when solving
i_n13.mps
ornetlarge.mps
from https://plato.asu.edu/ftp/lptestset/network/.