Skip to content

Commit

Permalink
Merge branch 'UpdateEnv' into AddPyrodigal
Browse files Browse the repository at this point in the history
  • Loading branch information
jpjarnoux committed Oct 4, 2023
2 parents 9d252ae + 4592c0a commit 39e383c
Show file tree
Hide file tree
Showing 108 changed files with 2,444 additions and 1,364 deletions.
3 changes: 1 addition & 2 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,7 @@ jobs:
shell: bash -l {0}
run: |
conda install -y --file requirements.txt
conda install -y pytest
pip install .
pip install .[test]
# Check that it is installed and displays help without error
- name: Check that PPanGGOLiN is installed
shell: bash -l {0}
Expand Down
35 changes: 35 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2
python:
install:
- requirements: docs/requirements.txt

# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.8"
# You can also specify other tool versions:
# nodejs: "19"
# rust: "1.64"
# golang: "1.19"

# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/conf.py

# Optionally build your docs in additional formats such as PDF and ePub
# formats:
# - pdf
# - epub

# Optional but recommended, declare the Python requirements required
# to build your documentation
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
# python:
# install:
# - requirements: docs/requirements.txt
146 changes: 146 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# PPanGGOLiN: Depicting microbial species diversity via a Partitioned PanGenome Graph Of Linked Neighbors

[![Actions](https://img.shields.io/github/actions/workflow/status/althonos/pyrodigal/test.yml?branch=main&logo=github&style=flat-square&maxAge=300)](https://github.com/labgem/ppanggolin/actions)
[![License](https://anaconda.org/bioconda/ppanggolin/badges/license.svg)](http://www.cecill.info/licences.fr.html)
[![Bioconda](https://img.shields.io/conda/vn/bioconda/ppanggolin?style=flat-square&maxAge=3600&logo=anaconda)](https://anaconda.org/bioconda/ppanggolin)
[![Source](https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400&style=flat-square)](https://github.com/labgem/ppanggolin/)
[![GitHub issues](https://img.shields.io/github/issues/labgem/ppanggolin.svg?style=flat-square&maxAge=600)](https://github.com/labgem/ppanggolin/issues)
[![Docs](https://img.shields.io/readthedocs/ppanggolin/latest?style=flat-square&maxAge=600)](https://ppanggolin.readthedocs.io)
[![Downloads](https://anaconda.org/bioconda/ppanggolin/badges/downloads.svg)](https://bioconda.github.io/recipes/ppanggolin/README.html#download-stats)

**PPanGGOLiN**
([Gautreau et al. 2020](https://doi.org/10.1371/journal.pcbi.1007732)) is a software suite used to create and manipulate prokaryotic pangenomes from a set of either genomic DNA sequences or provided genome annotations.
It is designed to scale up to tens of thousands of genomes.
It has the specificity to partition the pangenome using a statistical approach rather than using fixed thresholds which gives it the ability to work with low-quality data such as *Metagenomic Assembled Genomes (MAGs)* or *Single-cell Amplified Genomes (SAGs)* thus taking advantage of large scale environmental studies and letting users study the pangenome of uncultivable species.

**PPanGGOLiN** builds pangenomes through a graphical model and a statistical method to partition gene families in persistent, shell and cloud genomes.
It integrates both information on protein-coding genes and their genomic neighborhood to build a graph of gene families where each node is a gene family, and each edge is a relation of genetic contiguity.
The partitioning method promotes that two gene families that are consistent neighbors in the graph are more likely to belong to the same partition.
It results in a Partitioned Pangenome Graph (PPG) made of persistent, shell and cloud nodes drawing genomes on rails like a subway map to help biologists navigate the great diversity of microbial life.


Moreover, the panRGP method ([Bazin et al. 2020](https://doi.org/10.1093/bioinformatics/btaa792)) included in **PPanGGOLiN** predicts, for each genome, Regions of Genome Plasticity (RGPs) that are clusters of genes made of shell and cloud genomes in the pangenome graph.
Most of them arise from Horizontal gene transfer (HGT) and correspond to Genomic Islands (GIs).
RGPs from different genomes are next grouped in spots of insertion based on their conserved flanking persistent genes.


Those RGPs can be further divided in conserved modules by panModule ([Bazin et al. 2021](https://doi.org/10.1101/2021.12.06.471380)). Those conserved modules correspond to groups of cooccurring and colocalized genes that are gained or lost together in the variable regions of the pangenome.

```{image} _static/logo.png
:alt: ppangolin logo
:align: center
:heigth: 300
:width: 300
```

# Installation

**PPanGGOLiN** is easily installed via conda.
You will need the following conda channels if you don't have them already:

```bash
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
```

Then, you can just run :

```bash
conda install -c bioconda ppanggolin
```

# Quick usage

**PPanGGOLiN** integrates some workflows to build and analyse easily and rapidly a pangenome.
These commands can be tuned with some parameters but are mostly automatic.
All workflow parameters are described [here](https://ppanggolin.readthedocs.io/en/updateenv/user/Basic-usage-and-practical-information.html#the-workflow-subcommand).

## Pangenome graph construction and partition

To build and partition a pangenome, you can use the following command:
```bash
ppanggolin workflow --fasta ORGANISMS_FASTA_LIST
```

It uses parameters that we found to be generally the best when working with species pangenomes.

The file ORGANISMS_FASTA_LIST is a tsv-separated file with the following organization :
1. The first column contains a unique organism name **(without space)**
2. The second column the path to the associated FASTA file
3. Circular contig identifiers are indicated in the following columns
4. Each line represents an organism

An [example](https://github.com/labgem/PPanGGOLiN/blob/master/testingDataset/organisms.fasta.list) with 50 *Chlamydia trachomatis* genomes can be found in the testingDataset/ directory.


You can also give **PPanGGOLiN** your own annotations using *.gff* or *.gbff/.gbk* files instead of *.fasta* files,
such as the ones provided by prokka using the following command :

```bash
ppanggolin workflow --anno ORGANISMS_ANNOTATION_LIST
```

Another [example](https://github.com/labgem/PPanGGOLiN/blob/master/testingDataset/organisms.gbff.list) of such a file can be found in the testingDataset/ directory.

Both of those commands write several output files and graphics (more information [here](https://ppanggolin.readthedocs.io/en/updateenv/user/Outputs.html#ppanggolin-outputs)). Most notably, an HDF-5 (pangenome.h5) file is written.
It can be used as input for any of the subcommands to rerun parts of the analysis with different parameters,
write and draw different representations of the pangenome or run additional analysis with **PPanGGOLiN**.

A minimum of 5 genomes is generally required to perform a pangenomics analysis using the traditional *core genome*/*accessory genome* paradigm.
It is advised to use at least 15 genomes having genomic variations (and not only SNPs) to obtain robust results with the **PPanGGOLiN** statistical approach.

If you want to use personalized parameters for each subcommand, most options should be self-descriptive.
If you want to know more about what each output file is, or briefly how each subcommand works,
you can check the [steb by step documentation](https://github.com/labgem/PPanGGOLiN/wiki)


## Region of plasticity detection

Furthermore, you can also predict genomic islands and cluster them into spots of insertion using the **panRGP** pipeline.
The usage is identical to the previous 'workflow' command:

```bash
ppanggolin panrgp --fasta ORGANISMS_FASTA_LIST
```

It will run more analyses after the pangenome has been partitioned. Further details are available [here](https://ppanggolin.readthedocs.io/en/updateenv/user/Basic-usage-and-practical-information.html#the-panrgp-subcommand) and in the [panRPG publication](https://doi.org/10.1093/bioinformatics/btaa792)

## Conserved module prediction
To detect the conserved modules in your pangenome, you can use the panModule workflow, as such:

```bash
ppanggolin panmodule --fasta ORGANISMS_FASTA_LIST
```

Further details can be found [here](https://ppanggolin.readthedocs.io/en/updateenv/user/Basic-usage-and-practical-information.html#the-panmodule-subcommand) and in the [panModule publication](https://doi.org/10.1101/2021.12.06.471380)


Alternatively, to run all the possible analysis that **PPanGGOLiN** can run, you can use:

```bash
ppanggolin all --fasta ORGANISMS_FASTA_LIST
```

Overall, ppanggolin has a lot of subcommands and possibilities.
Don't hesitate to check the command line help, and the [GitHub wiki](https://github.com/labgem/PPanGGOLiN/wiki) to see all the possible analysis, if you are missing a file you're looking for, or do not understand an output.
You can also raise an `issue` if you wish!

# Issues, Questions, Remarks
If you have any question or issue with installing,
using or understanding **PPanGGOLiN**, please do not hesitate to post an issue!
We cannot correct bugs if we do not know about them, and will try to help you the best we can.

# Citation
If you use this tool for your research, please cite:

Gautreau G et al. (2020) **PPanGGOLiN**: Depicting microbial diversity via a partitioned pangenome graph.
PLOS Computational Biology 16(3): e1007732. <https://doi.org/10.1371/journal.pcbi.1007732>

If you use this tool to study genomic islands, please cite:

Bazin et al., panRGP: a pangenome-based method to predict genomic islands and explore their diversity, Bioinformatics, Volume 36, Issue Supplement_2, December 2020, Pages i651–i658, <https://doi.org/10.1093/bioinformatics/btaa792>

If you use this tool to study modules, please cite:

Bazin et al., panModule: detecting conserved modules in the variable regions of a pangenome graph. biorxiv. <https://doi.org/10.1101/2021.12.06.471380>
126 changes: 0 additions & 126 deletions README.rst

This file was deleted.

File renamed without changes.
Binary file added docs/_static/drawspot_example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/evolution.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/gephi.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/projection.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/resampling.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/runtimes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/tile_plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/u_plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
39 changes: 39 additions & 0 deletions docs/api/ppanggolin.RGP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# ppanggolin.RGP package

## Submodules

## ppanggolin.RGP.genomicIsland module

```{eval-rst}
.. automodule:: ppanggolin.RGP.genomicIsland
:members:
:undoc-members:
:show-inheritance:
```

## ppanggolin.RGP.rgp_cluster module

```{eval-rst}
.. automodule:: ppanggolin.RGP.rgp_cluster
:members:
:undoc-members:
:show-inheritance:
```

## ppanggolin.RGP.spot module

```{eval-rst}
.. automodule:: ppanggolin.RGP.spot
:members:
:undoc-members:
:show-inheritance:
```

## Module contents

```{eval-rst}
.. automodule:: ppanggolin.RGP
:members:
:undoc-members:
:show-inheritance:
```
21 changes: 21 additions & 0 deletions docs/api/ppanggolin.align.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# ppanggolin.align package

## Submodules

## ppanggolin.align.alignOnPang module

```{eval-rst}
.. automodule:: ppanggolin.align.alignOnPang
:members:
:undoc-members:
:show-inheritance:
```

## Module contents

```{eval-rst}
.. automodule:: ppanggolin.align
:members:
:undoc-members:
:show-inheritance:
```
Loading

0 comments on commit 39e383c

Please sign in to comment.