Merge branch 'dev' into context

labgem · Oct 10, 2023 · cbb38c8 · cbb38c8
2 parents e6fb61f + 93a8468
commit cbb38c8
Show file tree

Hide file tree

Showing 111 changed files with 2,442 additions and 1,359 deletions.
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -32,8 +32,7 @@ jobs:
       shell: bash -l {0}
       run: |
         conda install -y --file requirements.txt 
-        conda install -y pytest
-        pip install .
+        pip install .[test]
     # Check that it is installed and displays help without error
     - name: Check that PPanGGOLiN is installed
       shell: bash -l {0}

diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -0,0 +1,35 @@
+# .readthedocs.yaml
+# Read the Docs configuration file
+# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
+
+# Required
+version: 2
+python:
+  install:
+    - requirements: docs/requirements.txt
+
+# Set the OS, Python version and other tools you might need
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.8"
+    # You can also specify other tool versions:
+    # nodejs: "19"
+    # rust: "1.64"
+    # golang: "1.19"
+
+# Build documentation in the "docs/" directory with Sphinx
+sphinx:
+   configuration: docs/conf.py
+
+# Optionally build your docs in additional formats such as PDF and ePub
+# formats:
+#    - pdf
+#    - epub
+
+# Optional but recommended, declare the Python requirements required
+# to build your documentation
+# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
+# python:
+#    install:
+#    - requirements: docs/requirements.txt
diff --git a/README.md b/README.md
@@ -0,0 +1,148 @@
+# PPanGGOLiN: Depicting microbial species diversity via a Partitioned PanGenome Graph Of Linked Neighbors
+
+[![Actions](https://img.shields.io/github/actions/workflow/status/althonos/pyrodigal/test.yml?branch=main&logo=github&style=flat-square&maxAge=300)](https://github.com/labgem/ppanggolin/actions)
+[![License](https://anaconda.org/bioconda/ppanggolin/badges/license.svg)](http://www.cecill.info/licences.fr.html)
+[![Bioconda](https://img.shields.io/conda/vn/bioconda/ppanggolin?style=flat-square&maxAge=3600&logo=anaconda)](https://anaconda.org/bioconda/ppanggolin)
+[![Source](https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400&style=flat-square)](https://github.com/labgem/ppanggolin/)
+[![GitHub issues](https://img.shields.io/github/issues/labgem/ppanggolin.svg?style=flat-square&maxAge=600)](https://github.com/labgem/ppanggolin/issues)
+[![Docs](https://img.shields.io/readthedocs/ppanggolin/latest?style=flat-square&maxAge=600)](https://ppanggolin.readthedocs.io)
+[![Downloads](https://anaconda.org/bioconda/ppanggolin/badges/downloads.svg)](https://bioconda.github.io/recipes/ppanggolin/README.html#download-stats)
+
+**PPanGGOLiN**
+([Gautreau et al. 2020](https://doi.org/10.1371/journal.pcbi.1007732)) is a software suite used to create and manipulate prokaryotic pangenomes from a set of either genomic DNA sequences or provided genome annotations.
+It is designed to scale up to tens of thousands of genomes.
+It has the specificity to partition the pangenome using a statistical approach rather than using fixed thresholds which gives it the ability to work with low-quality data such as *Metagenomic Assembled Genomes (MAGs)* or *Single-cell Amplified Genomes (SAGs)* thus taking advantage of large scale environmental studies and letting users study the pangenome of uncultivable species.
+
+**PPanGGOLiN** builds pangenomes through a graphical model and a statistical method to partition gene families in persistent, shell and cloud genomes.
+It integrates both information on protein-coding genes and their genomic neighborhood to build a graph of gene families where each node is a gene family, and each edge is a relation of genetic contiguity.
+The partitioning method promotes that two gene families that are consistent neighbors in the graph are more likely to belong to the same partition.
+It results in a Partitioned Pangenome Graph (PPG) made of persistent, shell and cloud nodes drawing genomes on rails like a subway map to help biologists navigate the great diversity of microbial life.
+
+
+Moreover, the panRGP method ([Bazin et al. 2020](https://doi.org/10.1093/bioinformatics/btaa792)) included in **PPanGGOLiN** predicts, for each genome, Regions of Genome Plasticity (RGPs) that are clusters of genes made of shell and cloud genomes in the pangenome graph.
+Most of them arise from Horizontal gene transfer (HGT) and correspond to Genomic Islands (GIs). 
+RGPs from different genomes are next grouped in spots of insertion based on their conserved flanking persistent genes.
+
+
+Those RGPs can be further divided in conserved modules by panModule ([Bazin et al. 2021](https://doi.org/10.1101/2021.12.06.471380)). Those conserved modules correspond to groups of cooccurring and colocalized genes that are gained or lost together in the variable regions of the pangenome.
+
+
+<!-- ![PPanGGOLiN logo](docs/_static/logo.png) -->
+
+<!-- center the image with html syntax -->
+<img src="docs/_static/logo.png" 
+        alt="logo" 
+        style="display: block; margin: 0 auto" />
+
+
+# Installation
+
+**PPanGGOLiN** is easily installed via conda. 
+You will need the following conda channels if you don't have them already:
+
+```bash
+conda config --add channels defaults
+conda config --add channels bioconda
+conda config --add channels conda-forge
+```
+
+Then, you can just run :
+
+```bash
+conda install -c bioconda ppanggolin
+```
+
+# Quick usage
+
+**PPanGGOLiN** integrates some workflows to build and analyse easily and rapidly a pangenome. 
+These commands can be tuned with some parameters but are mostly automatic.
+All workflow parameters are described [here](https://ppanggolin.readthedocs.io/en/updateenv/user/Basic-usage-and-practical-information.html#the-workflow-subcommand).
+
+## Pangenome graph construction and partition
+
+To build and partition a pangenome, you can use the following command:
+```bash
+ppanggolin workflow --fasta ORGANISMS_FASTA_LIST
+```
+
+It uses parameters that we found to be generally the best when working with species pangenomes.
+
+The file ORGANISMS_FASTA_LIST is a tsv-separated file with the following organization :
+1. The first column contains a unique organism name **(without space)**
+2. The second column the path to the associated FASTA file
+3. Circular contig identifiers are indicated in the following columns
+4. Each line represents an organism
+
+An [example](https://github.com/labgem/PPanGGOLiN/blob/master/testingDataset/organisms.fasta.list) with 50 *Chlamydia trachomatis* genomes can be found in the testingDataset/ directory.
+
+
+You can also give **PPanGGOLiN** your own annotations using *.gff* or *.gbff/.gbk* files instead of *.fasta* files,
+such as the ones provided by prokka using the following command :
+
+```bash
+ppanggolin workflow --anno ORGANISMS_ANNOTATION_LIST
+```
+
+Another [example](https://github.com/labgem/PPanGGOLiN/blob/master/testingDataset/organisms.gbff.list) of such a file can be found in the testingDataset/ directory.
+
+Both of those commands write several output files and graphics (more information [here](https://ppanggolin.readthedocs.io/en/updateenv/user/Outputs.html#ppanggolin-outputs)). Most notably, an HDF-5 (pangenome.h5) file is written.
+It can be used as input for any of the subcommands to rerun parts of the analysis with different parameters,
+write and draw different representations of the pangenome or run additional analysis with **PPanGGOLiN**.
+
+A minimum of 5 genomes is generally required to perform a pangenomics analysis using the traditional *core genome*/*accessory genome* paradigm.
+It is advised to use at least 15 genomes having genomic variations (and not only SNPs) to obtain robust results with the **PPanGGOLiN** statistical approach.
+
+If you want to use personalized parameters for each subcommand, most options should be self-descriptive.
+If you want to know more about what each output file is, or briefly how each subcommand works,
+you can check the [steb by step documentation](https://github.com/labgem/PPanGGOLiN/wiki)
+
+
+## Region of plasticity detection
+
+Furthermore, you can also predict genomic islands and cluster them into spots of insertion using the **panRGP** pipeline.
+The usage is identical to the previous 'workflow' command:
+
+```bash
+ppanggolin panrgp --fasta ORGANISMS_FASTA_LIST
+```
+
+It will run more analyses after the pangenome has been partitioned. Further details are available [here](https://ppanggolin.readthedocs.io/en/updateenv/user/Basic-usage-and-practical-information.html#the-panrgp-subcommand) and in the [panRPG publication](https://doi.org/10.1093/bioinformatics/btaa792)
+
+## Conserved module prediction
+To detect the conserved modules in your pangenome, you can use the panModule workflow, as such:
+
+```bash
+ppanggolin panmodule --fasta ORGANISMS_FASTA_LIST
+```
+
+Further details can be found [here](https://ppanggolin.readthedocs.io/en/updateenv/user/Basic-usage-and-practical-information.html#the-panmodule-subcommand) and in the [panModule publication](https://doi.org/10.1101/2021.12.06.471380)
+
+
+Alternatively, to run all the possible analysis that **PPanGGOLiN** can run, you can use:
+
+```bash
+ppanggolin all --fasta ORGANISMS_FASTA_LIST
+```
+
+Overall, ppanggolin has a lot of subcommands and possibilities.
+Don't hesitate to check the command line help, and the [GitHub wiki](https://github.com/labgem/PPanGGOLiN/wiki) to see all the possible analysis, if you are missing a file you're looking for, or do not understand an output.
+You can also raise an `issue` if you wish!
+
+# Issues, Questions, Remarks
+If you have any question or issue with installing,
+using or understanding **PPanGGOLiN**, please do not hesitate to post an issue!
+We cannot correct bugs if we do not know about them, and will try to help you the best we can.
+
+# Citation
+If you use this tool for your research, please cite:
+
+Gautreau G et al. (2020) **PPanGGOLiN**: Depicting microbial diversity via a partitioned pangenome graph.
+PLOS Computational Biology 16(3): e1007732. <https://doi.org/10.1371/journal.pcbi.1007732>
+
+If you use this tool to study genomic islands, please cite:
+
+Bazin et al., panRGP: a pangenome-based method to predict genomic islands and explore their diversity, Bioinformatics, Volume 36, Issue Supplement_2, December 2020, Pages i651–i658, <https://doi.org/10.1093/bioinformatics/btaa792>
+
+If you use this tool to study modules, please cite:
+
+Bazin et al., panModule: detecting conserved modules in the variable regions of a pangenome graph. biorxiv. <https://doi.org/10.1101/2021.12.06.471380>
diff --git a/README.rst b/README.rst
diff --git a/docs/dev/Makefile → docs/Makefile b/docs/dev/Makefile → docs/Makefile
diff --git a/images/drawspot_example.png → docs/_static/drawspot_example.png b/images/drawspot_example.png → docs/_static/drawspot_example.png
diff --git a/docs/_static/evolution.png b/docs/_static/evolution.png
diff --git a/images/gephi.gif → docs/_static/gephi.gif b/images/gephi.gif → docs/_static/gephi.gif
diff --git a/images/logo.png → docs/_static/logo.png b/images/logo.png → docs/_static/logo.png
diff --git a/images/projection.png → docs/_static/projection.png b/images/projection.png → docs/_static/projection.png
diff --git a/images/resampling.png → docs/_static/resampling.png b/images/resampling.png → docs/_static/resampling.png
diff --git a/images/runtimes.png → docs/_static/runtimes.png b/images/runtimes.png → docs/_static/runtimes.png
diff --git a/docs/_static/tile_plot.png b/docs/_static/tile_plot.png
diff --git a/docs/_static/u_plot.png b/docs/_static/u_plot.png
diff --git a/images/workflow.png → docs/_static/workflow.png b/images/workflow.png → docs/_static/workflow.png
diff --git a/docs/api/ppanggolin.RGP.md b/docs/api/ppanggolin.RGP.md
@@ -0,0 +1,39 @@
+# ppanggolin.RGP package
+
+## Submodules
+
+## ppanggolin.RGP.genomicIsland module
+
+```{eval-rst}
+.. automodule:: ppanggolin.RGP.genomicIsland
+   :members:
+   :undoc-members:
+   :show-inheritance:
+```
+
+## ppanggolin.RGP.rgp_cluster module
+
+```{eval-rst}
+.. automodule:: ppanggolin.RGP.rgp_cluster
+   :members:
+   :undoc-members:
+   :show-inheritance:
+```
+
+## ppanggolin.RGP.spot module
+
+```{eval-rst}
+.. automodule:: ppanggolin.RGP.spot
+   :members:
+   :undoc-members:
+   :show-inheritance:
+```
+
+## Module contents
+
+```{eval-rst}
+.. automodule:: ppanggolin.RGP
+   :members:
+   :undoc-members:
+   :show-inheritance:
+```
diff --git a/docs/api/ppanggolin.align.md b/docs/api/ppanggolin.align.md
@@ -0,0 +1,21 @@
+# ppanggolin.align package
+
+## Submodules
+
+## ppanggolin.align.alignOnPang module
+
+```{eval-rst}
+.. automodule:: ppanggolin.align.alignOnPang
+   :members:
+   :undoc-members:
+   :show-inheritance:
+```
+
+## Module contents
+
+```{eval-rst}
+.. automodule:: ppanggolin.align
+   :members:
+   :undoc-members:
+   :show-inheritance:
+```