This project is a Python application that validates BLAST database configurations. It uses JSON configuration files to manage data sources for nucleotide and protein sequences. A YAML file can also be used to manage the database creation process.
In the bin folder, there is a Python script that validates the JSON configuration files, and it can be used directly on manuallya or automatically generated JSON files. The validator also checks the global YAML to find the different JSON files and their respective environments.
The docs folder contains documentation on how to create these configuration files, with required fields and structures.
This repo uses the pre-commit tool to perform some simple checks and validation before a git commit is executed.
The pre-commit hooks are configued in .pre-commit-config.yaml and the pre-commit tool is installed as a dev dependency in pyproject.toml
.
This only needs to be done once in a local repo.
poetry run pre-commit install
Run pre-commit hooks on changed files.
poetry run pre-commit run
Run pre-commit hooks on all files
poetry run pre-commit run --all-files
Follow this simple guide on what is required on a JSON configuration file
poetry run python bin/validate_blast_db_config.py
Quicktype was used to autogenerate python code from the Metadata JSON schema. This code can then be used to programmatically build a BLAST DB configuration based on the schema. When the schema is updated the code should be regenerated by running generate_code_from_schemas.sh.
./bin/generate_code_from_schemas.sh
Other languages are supported by Quicktype, but only python is currently in use.
Adam Wright and Paulo Nuin with support from the Alliance of Genome Resources and WormBase
- 0.1
- Initial Release