This repository contains code to go with the paper (bioRxiv)
A near-tight lower bound on the density of forward sampling schemes
- A script (run-benchmarks.sh) to benchmark existing sampling schemes, using the repo RagnarGrootKoerkamp/minimizers.
- Code to run our ILP (integer linear program, run-ilp.py) that searches for optimal sampling schemes for small parameters.
- A python notebook (plots.ipynb) that plots all results and lower bounds.
Listed below are the necessary packages and corresponding versions we used to perform the analysis.
- matplotlib (3.7.1)
- pandas (2.2.1)
- numpy (1.26.4)
- sympy (1.12)
- gurobipy (10.0.3) (requires a license, free for academics)
Benchmarks can be generated via
./run-benchmarks.sh
which takes around an hour on a machine with 6 cores.
The ILP models are built with gurobipy.
The run-ilp.py
script can construct and optimize a forward or local model. To run multiple models
with one command, you can supply multiple window sizes, k-mer sizes, and alphabet sizes. An ILP
is constructed for each combination of w, k, and sigma.
python run-ilp.py -w 2 3 4 -k 1 2 3 4 5 --sigma 2 3 4 --verbose
All options are listed with --help
:
>$ python run-ilp.py --help
usage: run-ilp.py [-h] -w WINDOW_SIZE [WINDOW_SIZE ...] -k KMER [KMER ...] --sigma SIGMA [SIGMA ...] [--local] [-o OUTPUT] [--time-limit TIME_LIMIT] [-t THREADS] [-v]
options:
-h, --help show this help message and exit
-w WINDOW_SIZE [WINDOW_SIZE ...], --window-size WINDOW_SIZE [WINDOW_SIZE ...]
-k KMER [KMER ...], --kmer KMER [KMER ...]
--sigma SIGMA [SIGMA ...]
--local Find minimum local scheme density
-o OUTPUT, --output OUTPUT
Path to output directory
--time-limit TIME_LIMIT
Time limit (in seconds)
-t THREADS, --threads THREADS
-v, --verbose Log ILP to output