Add first tutorial pages

labgem · Nov 14, 2023 · 8349dd8 · 8349dd8
1 parent 4d00b96
commit 8349dd8
Show file tree

Hide file tree

Showing 3 changed files with 58 additions and 1 deletion.
diff --git a/docs/index.md b/docs/index.md
@@ -45,6 +45,15 @@ RGPs from different genomes are next grouped in spots of insertion based on thei
 Those RGPs can be further divided in conserved modules by panModule ([Bazin et al. 2021](https://doi.org/10.1101/2021.12.06.471380)). Those conserved modules correspond to groups of cooccurring and colocalized genes that are gained or lost together in the variable regions of the pangenome.
 
 
+
+```{toctree}
+:caption: 'Tutorial:'
+:maxdepth: 1
+
+tutorial/inputData
+tutorial/workflows
+```
+
 ```{toctree}
 :caption: 'User Guide:'
 :maxdepth: 1
@@ -66,7 +75,7 @@ user/issues.md
 
 ```{toctree}
 :caption: 'Developper Guide:'
-:maxdepth: 2
+:maxdepth: 1
 
 dev/devRules
 dev/git

diff --git a/.../Basic-usage-and-practical-information.md → .../Basic-usage-and-practical-information.md b/.../Basic-usage-and-practical-information.md → .../Basic-usage-and-practical-information.md
diff --git a/docs/tutorial/inputData.md b/docs/tutorial/inputData.md
@@ -0,0 +1,48 @@
+# How to prepare your data for PPanGGOLiN
+
+To build and partition a pangenome, PPanGGOLiN need a set of either DNA sequences or provided genome annotations. In order to help you to start with PPanGGOLiN, you can follow this step to download some genomes from _Bradyrhizobium japonicum_. These genomes will be our base line all along the tutorial. If you already have your genome you can directly go to [the input file creation](#create-your-list-of-genomes-file)
+
+## Get B. _japonicum_ genomics data
+
+```{tip}
+To download our genomes, we are going to use [genome_updater](https://github.com/pirovc/genome_updater).
+Other solution exist such as [ncbi genome downloading scripts](https://github.com/kblin/ncbi-genome-download). Feel free to use the best and easiest way for you.
+```
+
+### GTDB genomes
+
+To obtain the genomes of B. _japonicum_ from the [GTDB database](https://gtdb.ecogenomic.org/), you must use the name of the species in GTDB.
+
+```
+genome_updater.sh -d "refseq,genbank" -f "genomic.gbff.gz" -o "B_japonicum_genomes" -M "gtdb" -T "s__Bradyrhizobium japonicum"
+```
+
+### NCBI/GenBank genomes
+
+To obtain the genomes of B. _japonicum_ from the [NCBI](https://www.ncbi.nlm.nih.gov/), you must use its taxonomic ID.
+
+```
+genome_updater.sh -d "refseq,genbank" -f "genomic.gbff.gz" -o "B_japonicum_genomes" -M "ncbi" -T "375"
+```
+
+## Create your list of genomes file
+
+PPanGGOLiN use the list of genomes as input for some command, such as the workflow.
+The file is a tsv-separated file with the following organisation :
+
+1. The first column contains a unique organism name
+2. The second column the path to the associated annotated file
+3. Each line represents an organism
+
+```{note}
+It's also possible to use fasta file as input.
+Look at the documentation.
+```
+
+If you are using the annotated genomes (*GBFF*, *GFF*, *GBK*), you can generate your file with the following command
+
+```
+for file in $(ls B_japonicum_genomes/*/files/*.gz);do genome=$(echo $file | cut -d'/' -f4 | cut -d'_' -f1-3); echo -e "$genome\t$file"; done > organism_gbff.list      
+```
+
+**You're now ready to build the pangenome !!!**