forked from danaj/Math-Prime-Util
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
186 lines (123 loc) · 6.92 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
- Testing requirements after changes:
* Test all functions return either native or bigints. Functions that
return raw MPU::GMP results will return strings, which isn't right.
* Valgrind, coverage
* use: -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fsigned-char
* Test on 32-bit Perl, Cygwin, Win32.
* Test on gcc70 (NetBSD), gcc119 (AIX/Power8), gcc22 (MIPS64), gcc115 (aarch)
* prove -b -I../Math-Prime-Util-GMP/blib/lib -I../Math-Prime-Util-GMP/blib/arch
- For new functions:
XS, .h, .c, PP, PPFE, export, t exports, lib/ntheory.pm, doc, test
- Move .c / .h files into separate directory.
version does it in a painful way. Something simpler to be had?
- finish test suite for bignum. Work on making it faster.
- An assembler version of mulmod for i386.
- More efficient Mertens. The current version has poor growth.
For 32-bit, consider doing XS followed by sum for remainder.
- It may be possible to have a more efficient ranged totient. We're using
the sieve up to n/2, which is better than most people seem to use, but I'm
not completely convinced we can't do better. The method at:
http://codegolf.stackexchange.com/a/26747/30069 ends up very similar. For
the monolithic results the main bottleneck seems to be the array return.
- Big features:
- QS factoring
- Figure out a way to make the internal FOR_EACH_PRIME macros use a segmented
sieve.
- Rewrite 23-primality-proofs.t for new format (keep some of the old tests?).
- Factoring in PP code is really wasteful -- we're calling _isprime7 before
we've done enough trial division, and later we're calling it on known
composites. Note how the XS code splits the factor code into the public
API (small factors, isprime, then call main code) and main code (just the
algorithm). The PP code isn't doing that, which means we're doing lots of
extra primality checks, which aren't cheap in PP.
- Consider using Test::Number::Delta for many tests
- More tweaking of LMO prime count.
- OpenMP. The step 7 inner loop is available.
- Convert to 32-bit+GMP to support large inputs, add to MPU::GMP.
- Try __int128.
- Variable sieve size
- look at sieve.c style prime walking
- Fenwick trees for prefix sums
- Iterators speedup:
1) config option for sieved next_prime. Very general, would make
next_prime run fast when called many times sequentially. Nasty
mixing with threads.
2) iterator, PrimeIterator, or PrimeArray in XS using segment sieve.
- Perhaps have main segment know the filled in range. That would allow
a sieved next_prime, and might speed up some counts and the like.
- Benchmark simple SoEs, SoA. Include Sisyphus SoE hidden in Math::GMPz.
- Try using malloc/free for win32 cache memory. #define NO_XSLOCKS
- Investigate optree constant folding in PP compilation for performance.
Use B::Deparse to check.
- Ensure a fast path for Math::GMP from MPU -> MPU:GMP -> GMP, and back.
- We don't use legendre_phi for other functions any more, but it'd be nice
to speed it up using some ideas from the Ohana 2011 SAGE branch. For example
(10**13,10**5) takes 2.5x longer, albeit with 6x less memory.
- More Pari: parforprime
- znlog:
= GMP BSGS for znlog.
= Clean up znlog (PH, BSGS, Rho).
= Experiment with Wang/Zhang 2012 Rho cycle finding
- consider using Ramanujan Li for PP code.
- xt/pari-compare: add chinese, factorial, vecmin, vecmax,
bernfrac, bernreal, LambertW.
- Proth test using LLR. Change mersenne test file to test both.
Note: what does this mean? Both LLR and Proth are in GMP now.
- harmreal and harmfrac for general $k
- For PP, do something like the fibprime examples. At load time, look for
the best library (GMPz, GMP, Pari, BigInt) and set $BICLASS. Then we
should use that class for everything. Go ahead and return that type.
Make a config variable to allow get/set.
- Support FH for print_primes. PerlIO_write is giving me fits.
- Test for print_primes. Not as easy with filenos.
- sum primes better than current method. Especially using less memory.
- divsum and divsummult as block functions.
The latter does sum = vecprod(1 + f(p_i) + f(p_i^2) + ... f(p_i^e) for all p.
- Consider Lim-Lee random prime generation, optionally with proof.
https://pdfs.semanticscholar.org/fd1d/864a95d7231eaf133b00a1757ee5d0bf0e07.pdf
libgcrypt/cipher/primegen.c
- More formal random prime generation for pedantic FIPS etc. users, with
guarantee of specific algorithm.
- surround_primes
- More Montgomery: znlog, catalan
- polymul, polyadd, polydiv, polyneg, polyeval, polyorder, polygcd, polylcm, polyroots, ...
A lot of our ops do these mod n, we could make ..mod versions of each.
- poly_is_reducible
- use word-based for-sieve for non-segment.
- remove start/end partial word tests from inner loop in for-sieve.
- sieve.h and util.h should get along better.
- compare wheel_t with primes separated and possibly cached.
- urandomm with bigints could be faster.
7.7s my $f=factorial(144); urandomm($f) for 1..5e5;
6.2s my $f=factorial(144); urandomm("$f") for 1..5e5;
4.8s my $f="".factorial(144); urandomm($f) for 1..5e5;
5.3s use Math::GMP qw/:constant/; my $f=factorial(144); urandomm($f) for 1..5e5;
1.7s my $f=Math::Prime::Util::GMP::factorial(144); Math::Prime::Util::GMP::urandomm($f) for 1..5e5;
In the first case, we're calling ""->bstr->_str once for validation in MPU
and and once for use in MPU::GMP.
The last case is all strings with no read/write bigint objects anywhere.
- Destroy csprng context on thread destruction.
- submit bug report for Perl error in 30b8ab3
- localized a/b in vecreduce, see:
https://metacpan.org/diff/file?target=REHSACK/List-MoreUtils-XS-0.428/&source=HERMES%2FList-MoreUtils-XS-0.427_001#XS.xs
perl #92264 (sort in 5.27.7)
- consider #define PERL_REENTRANT
- add back formultiperm optimization if we can get around lastfor issue.
- make a uint128_t version of montmath. Needs to handle 64-bit.
- sieve_range does trial division
- srand with no args should be calling GMP's srand with the selected seed
value. This is all a hacky artifact of having the two codebases.
- Look at using Ramanujan series for PP Li.
- _reftyped as XS call
- update prime count lower/upper from https://arxiv.org/pdf/1703.08032.pdf
- urandomr
- circular primes ... just use repdigits after 1M? https://oeis.org/A068652
- Figure out how to use multicall for forfactored and forsquarefree.
- perhaps square-free flag for factor for early stop. Use in moebius etc.
- make a NVCONST, define log, sqrt, etc. for quadmath vs. long double
- move most of our long double routines to NVCONST (see above).
- Change from Kalai to Bach's algorithm for random factored integers
https://maths-people.anu.edu.au/~brent/pd/multiplication-HK.pdf
- Adjust crossover in random_factored_integer PP code for Kalai vs. naive
- Pari/GP 2.12 beta has rewritten (much faster) Bernoulli. Check it out.
- semiprime_count PP just walk if small range.