Skip to content

Commit

Permalink
Add first tutorial pages
Browse files Browse the repository at this point in the history
  • Loading branch information
jpjarnoux committed Nov 14, 2023
1 parent 4d00b96 commit 8349dd8
Show file tree
Hide file tree
Showing 3 changed files with 58 additions and 1 deletion.
11 changes: 10 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,15 @@ RGPs from different genomes are next grouped in spots of insertion based on thei
Those RGPs can be further divided in conserved modules by panModule ([Bazin et al. 2021](https://doi.org/10.1101/2021.12.06.471380)). Those conserved modules correspond to groups of cooccurring and colocalized genes that are gained or lost together in the variable regions of the pangenome.



```{toctree}
:caption: 'Tutorial:'
:maxdepth: 1
tutorial/inputData
tutorial/workflows
```

```{toctree}
:caption: 'User Guide:'
:maxdepth: 1
Expand All @@ -66,7 +75,7 @@ user/issues.md

```{toctree}
:caption: 'Developper Guide:'
:maxdepth: 2
:maxdepth: 1
dev/devRules
dev/git
Expand Down
48 changes: 48 additions & 0 deletions docs/tutorial/inputData.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# How to prepare your data for PPanGGOLiN

To build and partition a pangenome, PPanGGOLiN need a set of either DNA sequences or provided genome annotations. In order to help you to start with PPanGGOLiN, you can follow this step to download some genomes from _Bradyrhizobium japonicum_. These genomes will be our base line all along the tutorial. If you already have your genome you can directly go to [the input file creation](#create-your-list-of-genomes-file)

## Get B. _japonicum_ genomics data

```{tip}
To download our genomes, we are going to use [genome_updater](https://github.com/pirovc/genome_updater).
Other solution exist such as [ncbi genome downloading scripts](https://github.com/kblin/ncbi-genome-download). Feel free to use the best and easiest way for you.
```

### GTDB genomes

To obtain the genomes of B. _japonicum_ from the [GTDB database](https://gtdb.ecogenomic.org/), you must use the name of the species in GTDB.

```
genome_updater.sh -d "refseq,genbank" -f "genomic.gbff.gz" -o "B_japonicum_genomes" -M "gtdb" -T "s__Bradyrhizobium japonicum"
```

### NCBI/GenBank genomes

To obtain the genomes of B. _japonicum_ from the [NCBI](https://www.ncbi.nlm.nih.gov/), you must use its taxonomic ID.

```
genome_updater.sh -d "refseq,genbank" -f "genomic.gbff.gz" -o "B_japonicum_genomes" -M "ncbi" -T "375"
```

## Create your list of genomes file

PPanGGOLiN use the list of genomes as input for some command, such as the workflow.
The file is a tsv-separated file with the following organisation :

1. The first column contains a unique organism name
2. The second column the path to the associated annotated file
3. Each line represents an organism

```{note}
It's also possible to use fasta file as input.
Look at the documentation.
```

If you are using the annotated genomes (*GBFF*, *GFF*, *GBK*), you can generate your file with the following command

```
for file in $(ls B_japonicum_genomes/*/files/*.gz);do genome=$(echo $file | cut -d'/' -f4 | cut -d'_' -f1-3); echo -e "$genome\t$file"; done > organism_gbff.list
```

**You're now ready to build the pangenome !!!**

0 comments on commit 8349dd8

Please sign in to comment.