add specific doc for gff, tables and proksee output

labgem · Nov 9, 2023 · de63e72 · de63e72
1 parent d365c2a
commit de63e72
Show file tree

Hide file tree

Showing 3 changed files with 100 additions and 0 deletions.
diff --git a/docs/user/Flat/gff.md b/docs/user/Flat/gff.md
@@ -0,0 +1,48 @@
+
+The `--gff` argument generates GFF files, each containing pangenome annotations for individual genomes within the pangenome. The GFF file format is a widely recognized standard in bioinformatics and can seamlessly integrate into downstream analysis tools.
+
+To generate GFF files from a pangenome HDF5 file, you can use the following command:
+
+```bash
+ppanggolin write_genomes -p pangenome.h5 --gff -o output
+```
+
+This command will create a gff directory within the output directory, with one GFF file per genome. 
+
+Pangenome annotations within the GFF are recorded in the attribute column of the file.
+
+For CDS features, pangenome annotations are recorded in the attribute column of the file:
+
+CDS features have the following attributes:
+
+- **family:** ID of the gene family to which the gene belongs.
+- **partition:** The partition of the gene family, categorized as persistent, shell, or cloud.
+- **module:** If the gene family belongs to a module, the module ID is specified with the key 'module.'
+- **rgp:** If the gene is part of a Region of Genomic Plasticity (RGP), the RGP name is specified with the key 'rgp.'
+
+For Regions of Genomic Plasticity (RGPs), RGPs are specified under the feature type 'region.'
+
+RGPs have the following attributes:
+
+- The attribute 'spot' designates the spot ID where the RGP is inserted. When the RGP has no spot, the term 'No_spot' is used.
+- The 'Note' attribute specifies that this feature is an RGP.
+
+
+Here is an example showcasing the initial lines of the GFF file for the Acinetobacter baumannii AYE genomes:
+
+```gff
+##gff-version 3
+##sequence-region NC_010401.1 1 5644
+##sequence-region NC_010402.1 1 9661
+##sequence-region NC_010403.1 1 2726
+##sequence-region NC_010404.1 1 94413
+##sequence-region NC_010410.1 1 3936291
+NC_010401.1	.	region	1	5644	.	+	.	ID=NC_010401.1;Is_circular=true
+NC_010401.1	ppanggolin	region	629	5591	.	.	.	Name=NC_010401.1_RGP_0;spot=No_spot;Note=Region of Genomic Plasticity (RGP)
+NC_010401.1	external	gene	629	1579	.	+	.	ID=gene-ABAYE_RS00005
+NC_010401.1	external	CDS	629	1579	.	+	0	ID=ABAYE_RS00005;Parent=gene-ABAYE_RS00005;product=replication initiation protein;family=ABAYE_RS00005;partition=cloud;rgp=NC_010401.1_RGP_0
+NC_010401.1	external	gene	1576	1863	.	+	.	ID=gene-ABAYE_RS00010
+NC_010401.1	external	CDS	1576	1863	.	+	0	ID=ABAYE_RS00010;Parent=gene-ABAYE_RS00010;product=hypothetical protein;family=ABAYE_RS00010;partition=cloud;rgp=NC_010401.1_RGP_0
+NC_010401.1	external	gene	2054	2572	.	-	.	ID=gene-ABAYE_RS00015
+NC_010401.1	external	CDS	2054	2572	.	-	0	ID=ABAYE_RS00015;Parent=gene-ABAYE_RS00015;product=tetratricopeptide repeat protein;family=HTZ92_RS18670;partition=shell;rgp=NC_010401.1_RGP_0
+```
diff --git a/docs/user/Flat/proksee.md b/docs/user/Flat/proksee.md
@@ -0,0 +1,31 @@
+The `--proksee` argument generates JSON map files containing pangenome annotations, which can be visualized using Proksee at [https://proksee.ca/](https://proksee.ca/).
+
+To generate JSON map files, you can use the following command:
+
+```bash
+ppanggolin write_genomes -p pangenome.h5 --proksee -o output
+```
+
+This command will create a proksee directory within the output directory, with one JSON file per genome. 
+
+
+To load a JSON map file on Proksee, follow these steps:
+1. Navigate to the "Map JSON" tab.
+2. Upload your file using the browse button.
+3. Click the "Create Map" button to generate the visualization.
+
+A genome visualized by Proksee with PPanGGOLiN annotation appears as depicted below:
+
+
+```{image} ../_static/proksee_exemple_A_baumannii_AYE.png
+:align: center
+```
+
+*Image: Genome visualized by Proksee with PPanGGOLiN annotation.*
+
+
+The visualization consists of three tracks:
+- **Genes:** Color-coded by their gene family partition.
+- **RGP (Region of Genomic Plasticity):** Spot associated to the RGPs are specified in the annotation of the object.
+- **Module:** Displaying modules within the genome. The completion of the module is specified in the annotation of the object.
+
diff --git a/docs/user/Flat/tables.md b/docs/user/Flat/tables.md
@@ -0,0 +1,21 @@
+This option writes in a 'tables' directory. There will be a file written in the .tsv file format for every single genome in the pangenome.
+The columns of this file are described in the following table : 
+
+| Column               | Description                                                                                                                    |
+|----------------------|--------------------------------------------------------------------------------------------------------------------------------|
+| gene                 | the unique identifier of the gene                                                                                              |
+| contig               | the contig that the gene is on                                                                                                 |
+| start                | the start position of the gene                                                                                                 |
+| stop                 | the stop position of the gene                                                                                                  |
+| strand               | The strand that the gene is on                                                                                                 |
+| ori                  | Will be T if the gene name is dnaA                                                                                                              |
+| family               | the family identifier to which the gene belongs to                                                                             |
+| nb_copy_in_org       | The number of copy of the family in the organism (basically, if 1, the gene has no closely related paralog in that organism) |
+| partition            | the partition to which the gene family of the gene belongs to                                                                  |
+| persistent_neighbors | The number of neighbors classified as 'persistent' in the pangenome graph                                                      |
+| shell_neighbors      | The number of neighbors classified as 'shell' in the pangenome graph                                                           |
+| cloud_neighbors      | The number of neighbors classidied as 'cloud' in the pangenome graph                                                           |
+
+Those files can be generated as such : 
+
+`ppanggolin write_genomes -p pangenome.h5 --tables`