pipeline overview

Below is a rough overview of what the main part of the pipeline does:

graph TD
    Dw("Download genomes")

    Genomad --> MGEs
    ISEScan --> MGEs
    DefenseFinder --> MGEs
    IntegronFinder --> MGEs

    Dw --> Genomad
    Dw --> ISEScan
    Dw --> DefenseFinder
    Dw --> IntegronFinder

    P --> J("extract junctions")

    J --> JG("build junction graphs")

    JG --> AM("assign MGEs")

    AM --> CS("select binary junctions")

    Dw --> P("build pangraph")

    P --> CGA("extract core genome alignment")

    CGA --> FCGA("filtered core-genome alignment")

    FCGA --> T("tree")

    MGEs --> AM

    CS --> C("infer gain/loss")
    T --> C

Loading

Genomes are downloaded from NCBI
Mobile Genetic Elements (MGEs) and defense systems are annotated using Genomad, ISEScan, DefenseFinder and IntegronFinder
Using pangraph, we build a pangenome graph including all of the chromosomes
From the graph we extract the core-genome alignment, we filter out highly mutated regions (putative recombination) and build a core-genome tree
From the graph we extract all junctions, build a graph for each junction
Junction graphs ar characterized in terms of number of distinct paths, total pangenome content and MGE presence
We then consider binary junctions and combine the pattern of paths with the core-genome tree to infer gain/loss events

This information is then collected and presented in the main figures produced by the pipeline.

Here you have a full view of the rule graph of snakemake.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workflow.md

workflow.md

pipeline overview

Files

workflow.md

Latest commit

History

workflow.md

File metadata and controls

pipeline overview