-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster radix conversion #499
base: develop
Are you sure you want to change the base?
Conversation
Yepp, should have tested I got |
Cannot happen at that place, result carefully chosen to not exceed |
Oh my, LTM's testing environment is really killing me! But it's all green now, so: Happy Holidays, y'all! |
Was able to make Helmström's N-R trick working for base 10. Benchmarked against div-only up to a string length of The N-R trick is about as fast/slow as div-standalone, maybe a bit slower for small input but it uses much more heap. I don't know if the large amount of recursions (not only the up to 29-depth tree for the D&C itself but also the recursions from the Karatsuba and Toom-Cook multiplications and fast division is implemented recursively, too) is a problem beside my testing. Can try to change the recursions in |
Added the N-R method for testing. Add |
4840c90
to
43241b0
Compare
6188f05
to
24cfac8
Compare
452cc1d
to
2f9a008
Compare
Needs a bit of a clean-up but otherwise good to go. |
c5ccf14
to
e60867d
Compare
I took the liberty to rebase, add some more changes and force-push the branch. Could we maybe squash this list of commits and fixups from 2020 and 2021 into 1-3 commits? Or does it make sense to retain the history? Maybe those N-R based tests could be of interest? |
It's always nice when you come to your bench, fresh coffee in your pot just to find out that all of the work has been done already! :-) Thanks a lot!
Yes, of course.
Oh, please don't! It was quite a painful experience: large numbers mean long waits for the tests to complete just to find out that something didn't quite work out at the end and so on. Nah, dust it with quicklime, bury the remains and let the horses run over the ground such that nobody can find it.
Yes, but it either needs a table with the corrections for which I have only empirical methods to find the values at this point, which is a bit cumbersome to say the least, or another N-R round. The difference in speed is already negligible and the second Newton round would ruin that slight speed advantage completely. What I vaguely remember to have planned was to pluck the Barret division out as a distinct function. But we can still do that if there is an actual use-case. |
e60867d
to
e5ce2d9
Compare
Removed all of the little commits bragging about doing chores (e.g.: correcting typos etc). Backup of old branch is in faster_radix_conversion_full_history, just in case. I had a bit of a hick-up while merging with local, hope I have repaired it successfully. |
@MasterDuke17 Your are marked as one of the reviewers. Do you want to add your 2 cents, too? I mean: more eyes, more better! to steal the term from A.v.E. |
e5ce2d9
to
eea3a5f
Compare
The timings here are quite different from the timings on my machine. the smaller the digit the smaller the relation because we use |
Signed-off-by: Steffen Jaeckel <[email protected]>
instead of all those copies Signed-off-by: Steffen Jaeckel <[email protected]>
fd23d2f
to
28b373d
Compare
Implemented the Schönhage method (Divide&Conquer) for radix conversion. I used normal division because Lars Helmström's method was a bit unstable for larger input (or I was to stupid to implement it, but the values are different to the normal division ).
Speed enhancement is as expected:
Tested
s_mp_faster_to_radix
with the loopold: 324 sec 119 ms 629 usec 727 nsec
new: 9 sec 149 ms 193 usec 858
Same loop with base 10 only: (
i = 10
fixed) andj = 17477
( log(17477 * 60)/log(10) ~ 6.0206 over a million decimal places)old: 62 sec 920 ms 351 usec 67 nsec
new: 0 sec 458 ms 471 usec 969 nsec
Results have been checked against the old function.
Looks good enough to me.
I did not benchmark the new
s_mp_faster_read_radix
and testing is only done with a small round-about test intest.c
.Ah, yes: restricting the new method to base 10 only wouldn't have saved a lot. One additional small table and about half a dozen lines of code would have been be saved, maybe 150 bytes or so, if at all.
I used some Ideas from Lars Helmström and some more from @MasterDuke17 to write this code. Don't blame them for my mistakes, please.