GitHub - camillescott/goetia: Streaming de Bruijn and Compact de Bruijn Graph Algorithms

goetia is a c++ library and software package for streaming analysis for de Bruijn Graphs, de Bruijn graph compaction, and genome sketching. The c++ library is fully available through Python via bindings generated by cppyy. The primary goals of goetia and its algorithms are:

Analyse data completely on-line with streaming methods,
Use as little of the data as possible.

This library is a work-in-progress and under rapid development. Some current usage examples can be found in the examples/ directory and a launched with binder using the badge above.

Installation

Conda

conda is the supported installation environment. Within a conda environment, install with:

conda install goetia

This will install the goetia python package, the libgoetia shared library, and its headers into $CONDA_PREFIX. With the environment activated, you can import goetia in Python or link against the C++ library with -lgoetia.

Development

Building from Source

To build and install from source, first clone the repo:

git clone https://github.com/camillescott/goetia && cd goetia

Create the conda environment. There is a Makefile target to generate the environment; it uses mamba, but this can be overridden by setting CONDA_FRONTEND to conda. The result environment is called goetia-dev and is defined in environment_dev.yml.

make create-dev-env
conda activate goetia-dev

Then build and install:

make install

The install target will build the C++ library and cppyy bindings, install the headers and shared library into $CONDA_PREFIX/lib and $CONDA_PREFIX/include, and install the associated python modules into the conda environment.

To install in-place, run:

make dev-install

This will use python -m pip install -e . to allow in-place editing of the python sources. However, changes to the C++ source will not be propagated, as the shared library has to be rebuilt. Run make install again to recompile and reinstall the headers and shared library.

Testing

Tests are written in pytest; the full suite can be run with:

pytest tests/

The test suite uses pytest-benchmark to gather performance information on some functions. This adds significant extra time to a number of tests. This can be bypassed by just running make test; or, explicitly, by running:

pytest --benchmark-disable tests/

Much of the de Bruijn graph test data is randomly generated; ie, we fuzz the library. This helps find edge cases, but means some tests might not be able to be rerun. To allow reproducibility, we use the pytest-randomly plugin, which manages random seed state and ordering. When pytest is run, the random seed will be reported toward the beginning of the output, in the form:

Using --randomly-seed=2507050705

To rerun with a specific seed, run pytest with the appropriate flag:

pytest --randomly-seed=[DESIRED_SEED]

Name		Name	Last commit message	Last commit date
Latest commit History 910 Commits
.github/workflows		.github/workflows
.vscode		.vscode
benchmarks		benchmarks
build-utils		build-utils
cmake		cmake
docs/img		docs/img
examples		examples
goetia		goetia
include/goetia		include/goetia
notebooks		notebooks
src/goetia		src/goetia
tests		tests
third-party/bbhash		third-party/bbhash
.clang-format		.clang-format
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
INSTALL		INSTALL
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
README.rst		README.rst
conda-deps.txt		conda-deps.txt
environment.yml		environment.yml
environment_dev.yml		environment_dev.yml
manifest.cmake		manifest.cmake
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Conda

Development

Building from Source

Testing

About

Releases 5

Packages

Contributors 2

Languages

License

camillescott/goetia

Folders and files

Latest commit

History

Repository files navigation

Installation

Conda

Development

Building from Source

Testing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 2

Languages

Packages