TODO

- Testing requirements after changes:
  * Test all functions return either native or bigints.  Functions that
    return raw MPU::GMP results will return strings, which isn't right.
  * Valgrind, coverage
  * use:  -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fsigned-char
  * Test on 32-bit Perl, Cygwin, Win32.
  * Test on gcc70 (NetBSD), gcc119 (AIX/Power8), gcc22 (MIPS64), gcc115 (aarch)
  * prove -b -I../Math-Prime-Util-GMP/blib/lib -I../Math-Prime-Util-GMP/blib/arch

- For new functions:
  XS, .h, .c, PP, PPFE, export, t exports, lib/ntheory.pm, doc, test


- Move .c / .h files into separate directory.
  version does it in a painful way.  Something simpler to be had?

- finish test suite for bignum.  Work on making it faster.

- An assembler version of mulmod for i386.

- More efficient Mertens.  The current version has poor growth.
  For 32-bit, consider doing XS followed by sum for remainder.

- It may be possible to have a more efficient ranged totient.  We're using
  the sieve up to n/2, which is better than most people seem to use, but I'm
  not completely convinced we can't do better.  The method at:
  http://codegolf.stackexchange.com/a/26747/30069 ends up very similar.  For
  the monolithic results the main bottleneck seems to be the array return.

- Big features:
   - QS factoring

- Figure out a way to make the internal FOR_EACH_PRIME macros use a segmented
  sieve.

- Rewrite 23-primality-proofs.t for new format (keep some of the old tests?).

- Factoring in PP code is really wasteful -- we're calling _isprime7 before
  we've done enough trial division, and later we're calling it on known
  composites.  Note how the XS code splits the factor code into the public
  API (small factors, isprime, then call main code) and main code (just the
  algorithm).  The PP code isn't doing that, which means we're doing lots of
  extra primality checks, which aren't cheap in PP.

- Consider using Test::Number::Delta for many tests

- More tweaking of LMO prime count.
    - OpenMP.  The step 7 inner loop is available.
    - Convert to 32-bit+GMP to support large inputs, add to MPU::GMP.
    - Try __int128.
    - Variable sieve size
    - look at sieve.c style prime walking
    - Fenwick trees for prefix sums

- Iterators speedup:
  1) config option for sieved next_prime.  Very general, would make
     next_prime run fast when called many times sequentially.  Nasty
     mixing with threads.
  2) iterator, PrimeIterator, or PrimeArray in XS using segment sieve.

- Perhaps have main segment know the filled in range.  That would allow
  a sieved next_prime, and might speed up some counts and the like.

- Benchmark simple SoEs, SoA.  Include Sisyphus SoE hidden in Math::GMPz.

- Try using malloc/free for win32 cache memory.  #define NO_XSLOCKS

- Investigate optree constant folding in PP compilation for performance.
  Use B::Deparse to check.

- Ensure a fast path for Math::GMP from MPU -> MPU:GMP -> GMP, and back.

- We don't use legendre_phi for other functions any more, but it'd be nice
  to speed it up using some ideas from the Ohana 2011 SAGE branch.  For example
  (10**13,10**5) takes 2.5x longer, albeit with 6x less memory.

- More Pari:  parforprime

- znlog:
    = GMP BSGS for znlog.
    = Clean up znlog (PH, BSGS, Rho).
    = Experiment with Wang/Zhang 2012 Rho cycle finding

- consider using Ramanujan Li for PP code.

- xt/pari-compare:  add chinese, factorial, vecmin, vecmax,
                        bernfrac, bernreal, LambertW.

- Proth test using LLR.  Change mersenne test file to test both.
  Note: what does this mean?  Both LLR and Proth are in GMP now.

- harmreal and harmfrac for general $k

- For PP, do something like the fibprime examples.  At load time, look for
  the best library (GMPz, GMP, Pari, BigInt) and set $BICLASS.  Then we
  should use that class for everything.  Go ahead and return that type.
  Make a config variable to allow get/set.

- Support FH for print_primes.  PerlIO_write is giving me fits.

- Test for print_primes.  Not as easy with filenos.

- sum primes better than current method.  Especially using less memory.

- divsum and divsummult as block functions.
  The latter does sum = vecprod(1 + f(p_i) + f(p_i^2) + ... f(p_i^e) for all p.

- Consider Lim-Lee random prime generation, optionally with proof.
  https://pdfs.semanticscholar.org/fd1d/864a95d7231eaf133b00a1757ee5d0bf0e07.pdf
  libgcrypt/cipher/primegen.c

- More formal random prime generation for pedantic FIPS etc. users, with
  guarantee of specific algorithm.

- surround_primes

- More Montgomery: znlog, catalan

- polymul, polyadd, polydiv, polyneg, polyeval, polyorder, polygcd, polylcm, polyroots, ...
  A lot of our ops do these mod n, we could make ..mod versions of each.

- poly_is_reducible

- use word-based for-sieve for non-segment.

- remove start/end partial word tests from inner loop in for-sieve.

- sieve.h and util.h should get along better.

- compare wheel_t with primes separated and possibly cached.

- urandomm with bigints could be faster.
   7.7s  my $f=factorial(144); urandomm($f) for 1..5e5;
   6.2s  my $f=factorial(144); urandomm("$f") for 1..5e5;
   4.8s  my $f="".factorial(144); urandomm($f) for 1..5e5;
   5.3s  use Math::GMP qw/:constant/; my $f=factorial(144); urandomm($f) for 1..5e5;
   1.7s  my $f=Math::Prime::Util::GMP::factorial(144); Math::Prime::Util::GMP::urandomm($f) for 1..5e5;
  In the first case, we're calling ""->bstr->_str once for validation in MPU
  and and once for use in MPU::GMP.
  The last case is all strings with no read/write bigint objects anywhere.

- Destroy csprng context on thread destruction.

- submit bug report for Perl error in 30b8ab3

- localized a/b in vecreduce, see:
  https://metacpan.org/diff/file?target=REHSACK/List-MoreUtils-XS-0.428/&source=HERMES%2FList-MoreUtils-XS-0.427_001#XS.xs
  perl #92264 (sort in 5.27.7)

- consider #define PERL_REENTRANT

- add back formultiperm optimization if we can get around lastfor issue.

- make a uint128_t version of montmath.  Needs to handle 64-bit.

- sieve_range does trial division

- srand with no args should be calling GMP's srand with the selected seed
  value.  This is all a hacky artifact of having the two codebases.

- Look at using Ramanujan series for PP Li.

- _reftyped as XS call

- update prime count lower/upper from https://arxiv.org/pdf/1703.08032.pdf

- urandomr

- circular primes ... just use repdigits after 1M?  https://oeis.org/A068652

- Figure out how to use multicall for forfactored and forsquarefree.

- perhaps square-free flag for factor for early stop.  Use in moebius etc.

- make a NVCONST, define log, sqrt, etc. for quadmath vs. long double

- move most of our long double routines to NVCONST (see above).

- Change from Kalai to Bach's algorithm for random factored integers
  https://maths-people.anu.edu.au/~brent/pd/multiplication-HK.pdf

- Adjust crossover in random_factored_integer PP code for Kalai vs. naive

- Pari/GP 2.12 beta has rewritten (much faster) Bernoulli.  Check it out.

- semiprime_count PP just walk if small range.