Faster radix conversion #499

czurnieden · 2020-12-24T05:14:16Z

Implemented the Schönhage method (Divide&Conquer) for radix conversion. I used normal division because Lars Helmström's method was a bit unstable for larger input (or I was to stupid to implement it, but the values are different to the normal division ).

Speed enhancement is as expected:
Tested s_mp_faster_to_radix with the loop

for (i=2;i<64;i++) {
   for (j=1;j<1700;j++){
        mp_rand(&N,j)

old: 324 sec 119 ms 629 usec 727 nsec
new: 9 sec 149 ms 193 usec 858

Same loop with base 10 only: (i = 10 fixed) and j = 17477 ( log(17477 * 60)/log(10) ~ 6.0206 over a million decimal places)
old: 62 sec 920 ms 351 usec 67 nsec
new: 0 sec 458 ms 471 usec 969 nsec

Results have been checked against the old function.

Looks good enough to me.

I did not benchmark the new s_mp_faster_read_radix and testing is only done with a small round-about test intest.c.

Ah, yes: restricting the new method to base 10 only wouldn't have saved a lot. One additional small table and about half a dozen lines of code would have been be saved, maybe 150 bytes or so, if at all.

I used some Ideas from Lars Helmström and some more from @MasterDuke17 to write this code. Don't blame them for my mistakes, please.

czurnieden · 2020-12-24T05:30:31Z

Yepp, should have tested read_radix better, labeled it with work in progress now.

I got undefined reference to s_mp_radix_exponent_y'because I made it private, as it should be, but it cannot be used intest.cany more because of that. It is a small table I can C&P it intotest.c` or does somebody here has a better idea?

czurnieden · 2020-12-24T06:02:56Z

clang-5.0 -I./ -Wall -Wsign-compare -Wextra -Wshadow -fsanitize=undefined -fno-sanitize-recover=all -fno-sanitize=float-divide-by-zero -Wdeclaration-after-statement -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wpointer-arith -Wsystem-headers -O3 -funroll-loops -fomit-frame-pointer -Wno-typedef-redefinition -Wno-tautological-compare -Wno-builtin-requires-header -m64    demo/test.o demo/shared.o libtommath.a -o test
Run test clang-5.0 -m64 
s_mp_faster_to_radix.c:25:12: runtime error: signed integer overflow: 65536 * 65536 cannot be represented in type 'int'
error 1 while running tests
The command "./testme.sh ${BUILDOPTIONS}" exited with 128.
after_failure.1
0.00s$ cat test_*.log
Digit size 64 Bit 
Size of mp_digit: 8
Size of mp_word: 16
MP_DIGIT_BIT: 60
MP_DEFAULT_DIGIT_COUNT: 32

Cannot happen at that place, result carefully chosen to not exceed 2^20 which fits in an int32_t. Great, now I have to convince the compiler to accept that fact, too ;-)

czurnieden · 2020-12-24T09:43:42Z

Oh my, LTM's testing environment is really killing me!
Did I just spend 2 hours to find out that I have to sanitize the output of mtest?

But it's all green now, so: Happy Holidays, y'all!

czurnieden · 2020-12-29T20:12:47Z

Was able to make Helmström's N-R trick working for base 10. Benchmarked against div-only up to a string length of 2^19 (both together did not allow for more because with that amount of recursion the 8meg stack (that's quite a standard size for Linux) overflowed but both work for larger input stand-alone).

The N-R trick is about as fast/slow as div-standalone, maybe a bit slower for small input but it uses much more heap.
To make N-R work for all of the other bases it needs a table with the individual error corrections which I cannot find mathematically (it is unknown to me if it is actually impossible) only empirically.

I don't know if the large amount of recursions (not only the up to 29-depth tree for the D&C itself but also the recursions from the Karatsuba and Toom-Cook multiplications and fast division is implemented recursively, too) is a problem beside my testing. Can try to change the recursions in read_radix to iterations but I have to check if it can be done without a stack-on-the-heap otherwise its gets quite complicated and ugly.

czurnieden · 2020-12-29T20:58:11Z

Added the N-R method for testing. Add MP_TO_RADIX_USE_NEWTON_RAPHSON to LTM_CFLAGS to switch it on for base 10.

czurnieden · 2023-04-05T15:15:32Z

Needs a bit of a clean-up but otherwise good to go.

sjaeckel · 2023-04-11T14:41:56Z

I took the liberty to rebase, add some more changes and force-push the branch.

Could we maybe squash this list of commits and fixups from 2020 and 2021 into 1-3 commits? Or does it make sense to retain the history? Maybe those N-R based tests could be of interest?

czurnieden · 2023-04-11T15:43:50Z

It's always nice when you come to your bench, fresh coffee in your pot just to find out that all of the work has been done already! :-)

Thanks a lot!

Could we maybe squash this list of commits and fixups from 2020 and 2021 into 1-3 commits?

Yes, of course.

Or does it make sense to retain the history?

Oh, please don't! It was quite a painful experience: large numbers mean long waits for the tests to complete just to find out that something didn't quite work out at the end and so on. Nah, dust it with quicklime, bury the remains and let the horses run over the ground such that nobody can find it.

Maybe those N-R based tests could be of interest?

Yes, but it either needs a table with the corrections for which I have only empirical methods to find the values at this point, which is a bit cumbersome to say the least, or another N-R round. The difference in speed is already negligible and the second Newton round would ruin that slight speed advantage completely.
But I am still not sure if I just made a small mistake somewhere which caused that mess, so it might be a good idea to keep that part?

What I vaguely remember to have planned was to pluck the Barret division out as a distinct function. But we can still do that if there is an actual use-case.

czurnieden · 2023-04-11T19:21:21Z

Removed all of the little commits bragging about doing chores (e.g.: correcting typos etc).

Backup of old branch is in faster_radix_conversion_full_history, just in case.

I had a bit of a hick-up while merging with local, hope I have repaired it successfully.

czurnieden · 2023-04-11T19:23:55Z

@MasterDuke17 Your are marked as one of the reviewers. Do you want to add your 2 cents, too? I mean: more eyes, more better! to steal the term from A.v.E.

czurnieden · 2023-06-21T14:33:23Z

The timings here are quite different from the timings on my machine. the smaller the digit the smaller the relation because we use used for the cut-offs. I got for MP_16BIT a relation of about 2:1, for MP_32BIT one of around 3:1 and for MP_64BIT about 4:1. Most of the results of the CI are about the same except when it is not.
Mmh…

Signed-off-by: Steffen Jaeckel <[email protected]>

instead of all those copies Signed-off-by: Steffen Jaeckel <[email protected]>

czurnieden requested review from sjaeckel and MasterDuke17 December 24, 2020 05:14

czurnieden added the work in progress label Dec 24, 2020

czurnieden mentioned this pull request Dec 28, 2020

Replaced "fgets" with a "get_token" function in demo/mtest_opponent.c #500

Merged

czurnieden force-pushed the faster_radix_conversion branch from 4840c90 to 43241b0 Compare January 3, 2021 20:11

czurnieden mentioned this pull request Jan 4, 2021

Initialized "cmd" (and "buf" for symmetry) to sooth clang >= 8 #501

Closed

czurnieden force-pushed the faster_radix_conversion branch from 6188f05 to 24cfac8 Compare March 11, 2023 22:05

czurnieden force-pushed the faster_radix_conversion branch from 452cc1d to 2f9a008 Compare April 5, 2023 15:02

czurnieden added this to the v2.0.0 milestone Apr 5, 2023

sjaeckel force-pushed the faster_radix_conversion branch 2 times, most recently from c5ccf14 to e60867d Compare April 11, 2023 14:31

czurnieden force-pushed the faster_radix_conversion branch from e60867d to e5ce2d9 Compare April 11, 2023 19:16

MasterDuke17 mentioned this pull request Apr 13, 2023

Switch from tommath to gmp MoarVM/MoarVM#1402

Draft

czurnieden force-pushed the faster_radix_conversion branch from e5ce2d9 to eea3a5f Compare June 21, 2023 12:47

czurnieden mentioned this pull request Jun 21, 2023

Addition of a man-page #559

Merged

czurnieden and others added 4 commits July 1, 2023 14:35

Addition of faster to_radix function

c1bcc9e

Addition of faster read_radix method

6c9fa6b

add option to use standard strlen() instead of home-baked version

11e7350

Signed-off-by: Steffen Jaeckel <[email protected]>

add s_mp_floor_ilog2()

fec4f5c

instead of all those copies Signed-off-by: Steffen Jaeckel <[email protected]>

czurnieden added 4 commits July 1, 2023 14:35

Print additional debugging information in test.c

f7d5d32

formatted

7d8209a

temporarily removed timing

816833c

clean up

28b373d

czurnieden force-pushed the faster_radix_conversion branch from fd23d2f to 28b373d Compare July 1, 2023 12:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster radix conversion #499

Faster radix conversion #499

czurnieden commented Dec 24, 2020

czurnieden commented Dec 24, 2020

czurnieden commented Dec 24, 2020

czurnieden commented Dec 24, 2020

czurnieden commented Dec 29, 2020

czurnieden commented Dec 29, 2020

czurnieden commented Apr 5, 2023

sjaeckel commented Apr 11, 2023

czurnieden commented Apr 11, 2023

czurnieden commented Apr 11, 2023

czurnieden commented Apr 11, 2023

czurnieden commented Jun 21, 2023

Faster radix conversion #499

Are you sure you want to change the base?

Faster radix conversion #499

Conversation

czurnieden commented Dec 24, 2020

czurnieden commented Dec 24, 2020

czurnieden commented Dec 24, 2020

czurnieden commented Dec 24, 2020

czurnieden commented Dec 29, 2020

czurnieden commented Dec 29, 2020

czurnieden commented Apr 5, 2023

sjaeckel commented Apr 11, 2023

czurnieden commented Apr 11, 2023

czurnieden commented Apr 11, 2023

czurnieden commented Apr 11, 2023

czurnieden commented Jun 21, 2023