-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed of division with remainder for cpp_int #537
Comments
In my case the numerator has 78000 digits and the denominator 30000. |
It might be possible to low-word the shift on the carry. But I am really not sure if that exact sequence is using full multiples of the number of bits in a limb for the shift. As a general comment, it's really hard to beat or match the raw speed of GMP when using portable code. They squeeze the assembler for their well-deserved, legendary reputation for bare-metal speed. |
I'm not surprised by this. Division is hard, so many corner cases to handle correctly. There are better algorithms (in the big-O sense), but it's a significant piece of work to implement them correctly. |
I actually have a Knuth long division in an unrelated project. I've run hundreds of millions of tests on it and squeezed all the bugs out over the course of, my gosh, years. I use an opposite limb-ordering. I feel that Knuth long division is the winner for limb-counts below 2,000 or so. But at that range of your test case, you're in the quadratic domain of classic algorithms. You'd probably be better off inverting-and-multiplying with conversion to bin-float, then Newton-Raphson-ing it. But to assemble the result, you'd be desperately driven to make sure you rorund correctly. Oh my gosh, I get exahusted just thingking about verifying and testing that code for hard-world-open use. I wouldn't take on either of these tasks at the moment. But maybe note this as a potential place for optimization in the future... |
Since I've never spoken to, really anyone about this one and it's a gem, ... Here is that Knuth long division - in it's full-on, loving, 300 lines of glory (albeit with the wrong limb-ordering) for future reference if anyone ever cares for it. |
In a geometric code I observe a slowdown of a factor 4, when I switch from the gmp to the cpp_int backend. In vtune I see that more than 35% of the time are spent here. The comment at the signature leads me to the question if I am in the scenario where this function is not optimized. Is there a trivial fix? The comment seems to indicate that it is optimized for small numbers of limbs.
The text was updated successfully, but these errors were encountered: