Skip to content

snakemake pipeline for the analysis of structural genomic evolution of E. coli ST131

License

Notifications You must be signed in to change notification settings

mmolari/ecoliST131-structural-evo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Structural genome evolution in E. coli ST131

This repository contains a snakemake pipeline for the analysis of structural genomic evolution of E.coli ST131 presented in our paper.

The dataset consists of complete E. coli ST131 genomes available on RefSeq. Accession numbers and metadata for the considered strains can be found in the datasets folder.

In short, the pipeline uses pangraph to build a pangenome graph representation for the chromosomes of all of the considered strains. It then extracts all regions of structural variations, assigns MGEs and defense systems to each of these regions, and detect events that can be parsimoniously interpreted as simple gain or loss of sequence. See this note for an overview of the pipeline.

The pipeline produces as output a results folder, containing processed data such as the pangenome graph and the junction graphs, and a figs folder, containing amongst other the main figures of the paper.

setup

  • Execution requires a valid installation of conda, mamba and snakemake (v7.32.4).
  • For pangenome graph creation, the pangraph command must be available in path, see pangraph documentation for installation instructions.
  • optionally, to facilitate download of genbank records from ncbi, your personal api key can be saved in config/ncbi_api_key.txt. It will be automatically used when downloading the data.

execution

to execute the pipeline locally, it is sufficient to run:

snakemake --use-conda --cores 1 all

You can replace 1 with the desired number of cores.

Give the high number of jobs and the memory and time requirements we advise executing on cluster. Execution using the SLURM workload manager is already set up and the pipeline can be executed with:

snakemake --profile cluster all

citation

Evolutionary dynamics of genome structure and content among closely related bacteria
Marco Molari, Liam P. Shaw and Richard A. Neher, biorxiv (2024)
doi: https://doi.org/10.1101/2024.07.08.602537

About

snakemake pipeline for the analysis of structural genomic evolution of E. coli ST131

Resources

License

Stars

Watchers

Forks

Packages

No packages published