Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add actions #3

Merged
merged 8 commits into from
Apr 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: Build and release
on:
push:
branches: main
workflow_dispatch:

jobs:
bump-version:
name: Release version
runs-on: ubuntu-latest

steps:
- uses: GoogleCloudPlatform/release-please-action@v3
id: release
with:
release-type: python
package-name: assembly_snptyper
27 changes: 27 additions & 0 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: assembly_snptyper test

on: [push, pull_request]

jobs:
build:
runs-on: ${{ matrix.config.os }}
strategy:
fail-fast: false
matrix:
config:
- {os: ubuntu-latest}
name: Testing assembly_snptyper ${{ matrix.config.os }}

steps:
- uses: actions/checkout@v4
- name: Install Conda environment with Micromamba
uses: mamba-org/setup-micromamba@v1
with:
cache-environment: true
environment-file: env.yaml
- name: Conda list
shell: bash -l {0}
run: conda list
- name: Test M1UK typing end-to-end
shell: bash -l {0}
run: bash tests/test_e2e.sh
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,18 @@ assembly_snptyper --version
Note that:
- the reference VCF must be based on the reference genome (i.e. identical chromosome names).
- the list of input files must be a file list of paths to assemblies. This is to accomodate running the script on large numbers of files
- the reference fasta must be unzipped

### Example

Running `assembly_snptyper` for *S. pyogenes* M1UK, with paths to genome assemblies listed in `fastas.txt`:

```
assembly_snptyper --vcf data/M1UK.vcf --reference data/MGAS5005.fa --list_input fastas.txt -p 4 --verbose > output.txt
```


## Help message

```
assembly_snptyper --help
Expand Down
10 changes: 8 additions & 2 deletions assembly_snptyper/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ def check_if_ref_is_ascii(reference):
"Reference genome cannot be read as flat text: only unzipped FASTA reference genomes are supported."
)


def convert_vcf_to_bed(vcf, bed_path):
"""
Convert reference VCF to BED file
Expand Down Expand Up @@ -264,7 +265,10 @@ def wrapper(args_dict):
"""
sample = Path(args_dict["input_asm"]).stem
result = run_oneliner(
args_dict["bed_path"], args_dict["reference"], args_dict["input_asm"], args_dict["minimap_preset"]
args_dict["bed_path"],
args_dict["reference"],
args_dict["input_asm"],
args_dict["minimap_preset"],
)
output = parse_mpileup_output(result, args_dict["vcf"], sample)
logging.info(f"Processed {sample}")
Expand Down Expand Up @@ -404,7 +408,9 @@ def main():
logging.info(f"Created temporary bed file: {bed_path}")

logging.info("Starting typing workflow")
run_parallel(bed_path, args.reference, list_input, args.vcf, minimap_preset, args.processes)
run_parallel(
bed_path, args.reference, list_input, args.vcf, minimap_preset, args.processes
)


if __name__ == "__main__":
Expand Down
34 changes: 34 additions & 0 deletions tests/test_e2e.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash

set -euxo pipefail

TEST_DATA_DIR=tests/tmp

# Download
mkdir -p "$TEST_DATA_DIR"
## M1UK strain from Lynskey et al.
curl --output "$TEST_DATA_DIR"/GCA_034818825.1.zip "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_034818825.1/download?include_annotation_type=GENOME_FASTA&hydrated=FULLY_HYDRATED"
## M1global strain from Lynskey et al.
curl --output "$TEST_DATA_DIR"/GCA_034822885.1.zip "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_034822885.1/download?include_annotation_type=GENOME_FASTA&hydrated=FULLY_HYDRATED"

# Unzip
unzip -n "$TEST_DATA_DIR"/GCA_034818825.1.zip -d "$TEST_DATA_DIR"
unzip -n "$TEST_DATA_DIR"/GCA_034822885.1.zip -d "$TEST_DATA_DIR"

# List genomes
find "$TEST_DATA_DIR" -iname "*.fna" -print > "$TEST_DATA_DIR"/test_genomes.txt

# Install the script
python -m pip install .

# Run the script
assembly_snptyper \
--list_input "$TEST_DATA_DIR"/test_genomes.txt \
--vcf data/M1UK.vcf \
--reference data/MGAS5005.fa \
-p 2 > "$TEST_DATA_DIR"/test_output.txt

# Check the output
cmp <(sort "$TEST_DATA_DIR"/test_output.txt) <(sort tests/test_expected_output.txt)

rm -rf "$TEST_DATA_DIR"
3 changes: 3 additions & 0 deletions tests/test_expected_output.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
sample matching_variants wt_variants variants_in_scheme variants_missing variants_multiple_cov
GCA_034818825.1_PDT001921168.1_genomic 27 0 27 0 0
GCA_034822885.1_PDT001920952.1_genomic 0 27 27 0 0