Skip to content

Commit

Permalink
add a first documentation for the write_pangenome and write_genome cmds
Browse files Browse the repository at this point in the history
  • Loading branch information
JeanMainguy committed Nov 9, 2023
1 parent 7020706 commit d365c2a
Show file tree
Hide file tree
Showing 11 changed files with 45 additions and 50 deletions.
Binary file added docs/_static/proksee_exemple_A_baumannii_AYE.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions docs/user/Flat/RGP.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
This file is a tsv file that lists all of the detected Regions of Genome Plasticity. This requires to have run the RGP detection analysis by either using the `panrgp` command or the `rgp` command.

It can be written with the following command:
`ppanggolin write -p pangenome.h5 --regions`
`ppanggolin write_pangenome -p pangenome.h5 --regions`

The file has the following format :

Expand All @@ -21,7 +21,7 @@ The file has the following format :
This is a tsv file with two column. It links the spots of 'summarize_spots' with the RGPs of 'plastic_regions'.

It is written with the following command:
`ppanggolin write -p pangenome.h5 --spots`
`ppanggolin write_pangenome -p pangenome.h5 --spots`

|column|description|
|------|------------|
Expand All @@ -33,7 +33,7 @@ It is written with the following command:
This is a tsv file that will associate each spot with multiple metrics that can indicate the dynamic of the spot.

It is written with the following command:
`ppanggolin write -p pangenome.h5 --spots`
`ppanggolin write_pangenome -p pangenome.h5 --spots`

|column| description|
|-------|------------|
Expand All @@ -49,7 +49,7 @@ It is written with the following command:
#### Borders

Each spot has at least one set of gene families bordering them. To write the list of gene families bordering a spot, you need to use the following option:
`ppanggolin write -p pangenome.h5 --borders`
`ppanggolin write_pangenome -p pangenome.h5 --borders`

It will write a .tsv file with 4 columns:

Expand Down
2 changes: 1 addition & 1 deletion docs/user/Flat/dupplication.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@ This file lists the gene families, their duplication ratio, their mean presence

It can be generated using the 'write' subcommand as such :

`ppanggolin write -p pangenome.h5 --stats`
`ppanggolin write_pangenome -p pangenome.h5 --stats`

This command will also generate the 'organisms_statistics.tsv' file.
2 changes: 1 addition & 1 deletion docs/user/Flat/fam2gen.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ It is basically a three-column file listing the gene family name in the first co

You can obtain it as such :

`ppanggolin write -p pangenome.h5 --families_tsv`
`ppanggolin write_pangenome -p pangenome.h5 --families_tsv`
6 changes: 4 additions & 2 deletions docs/user/Flat/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,11 @@ It could be necessary to get more information about the modules.
Here we provide information about families, and we separate modules in
function of the partition. You can get this supplementary information
as such :
```
```bash
ppanggolin metrics -p pangenome.h5 --info_modules
...
```

```
Modules : 3
Families in Modules : 22 (min : 5, max : 9, sd : 2.08, mean : 7.33)
Sheel specific : 36.36 (sd : 4.62, mean : 2.67)
Expand Down
10 changes: 5 additions & 5 deletions docs/user/Flat/module.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#### Functional modules
This .tsv file lists the modules and the gene families that belong to them. It lists one family per line, and there are multiple line for each module.
It is written along with other files with the following command:
`ppanggolin write -p pangenome.h5 --modules`
`ppanggolin write_pangenome -p pangenome.h5 --modules`

It follows the following format:
|column|description|
Expand All @@ -12,7 +12,7 @@ It follows the following format:
#### Modules in organisms
This .tsv file lists for each organism the modules that are present and how complete they are. Since there are some variability that are allowed in the module predictions, occasionnally some modules can be incomplete in some of the organisms where they are found.
This file is written along with other files with the following command:
`ppanggolin write -p pangenome.h5 --modules`
`ppanggolin write_pangenome -p pangenome.h5 --modules`

And it follows the following format:
|column|description|
Expand All @@ -24,7 +24,7 @@ And it follows the following format:
#### modules summary
This .tsv file lists a few characteristics for each detected module. There is one line for each module.
The file is written along with other files with the following command:
`ppanggolin write -p pangenome.h5 --modules`
`ppanggolin write_pangenome -p pangenome.h5 --modules`

And it follows the following format:
|column|description|
Expand All @@ -39,7 +39,7 @@ And it follows the following format:
This command is available only if both modules and spots have been computed for your pangenome (see the command `all`, or the commands `spot` and `module` for that).
It indicates which modules are present in which spot and in which RGP.
The files are written with the following command:
```ppanggolin write -p pangenome.h5 --spot_modules```
```ppanggolin write_pangenome -p pangenome.h5 --spot_modules```
The format of the 'modules_spots.tsv' file is the following:

|column|description|
Expand All @@ -56,4 +56,4 @@ The file 'modules_RGP_lists.tsv' lists RGPs that have the same modules. Those RG
|mod_list| a list of the modules that are in the indicated RGPs|
|RGP_list| a list of RGP that include exactly the modules listed previously|

This information can also be visualized through figures that can be drawn with `ppanggolin draw --spots` (see [Spot plots](https://github.com/labgem/PPanGGOLiN/wiki/Outputs#spot-plots), and which can display modules.
This information can also be visualized through figures that can be drawn with `ppanggolin draw --spots` (see [Spot plots](https://github.com/labgem/PPanGGOLiN/wiki/Outputs#spot-plots), and which can display modules.)
2 changes: 1 addition & 1 deletion docs/user/Flat/orgStat.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,6 @@ This file is made of 15 columns described in the following table

It can be generated using the 'write' subcommand as such :

`ppanggolin write -p pangenome.h5 --stats`
`ppanggolin write_pangenome -p pangenome.h5 --stats`

This command will also generate the 'mean_persistent_duplication.tsv' file.
2 changes: 1 addition & 1 deletion docs/user/Flat/partition.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ Those files will be stored in the 'partitions' directory and will be named after

You can generate those files as such :

` ppanggolin write -p pangenome.h5 --partitions`
` ppanggolin write_pangenome -p pangenome.h5 --partitions`
4 changes: 2 additions & 2 deletions docs/user/Flat/presAbs.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ This file is basically a presence absence matrix. The columns are the genomes us

It can be generated using the 'write' subcommand as such :

`ppanggolin write -p pangenome.h5 --Rtab`
`ppanggolin write_pangenome -p pangenome.h5 --Rtab`

### matrix

This file is a .csv file following a format alike the gene_presence_absence.csv file generated by [roary](https://sanger-pathogens.github.io/Roary/), and works with [scoary](https://github.com/AdmiralenOla/Scoary) if you want to do pangenome-wide association studies.

It can be generated using the 'write' subcommand as such :

`ppanggolin write -p pangenome.h5 --csv`
`ppanggolin write_pangenome -p pangenome.h5 --csv`
21 changes: 0 additions & 21 deletions docs/user/Flat/projection.md

This file was deleted.

38 changes: 26 additions & 12 deletions docs/user/Outputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ PPanGGOLiN provides multiple outputs to describe a pangenome. In this section th

In most cases it will provide with a HDF-5 file named "pangenome.h5". This file stores all the information about your pangenome and the analysis that were run. If given to ppanggolin through most of the subcommands, it will read information from it. This is practical as you can regenerate figures or output files, or rerun parts of the analysis without redoing everything.

In this section, each parts will describe a possible output of PPanGGOLiN, and will be commented with the command line that generates it using the HDF5 file, which is assumed to be called 'pangenome.h5'.
In this section, each part will describe a possible output of PPanGGOLiN, and will be commented with the command line that generates it using the HDF5 file, which is assumed to be called 'pangenome.h5'.

When using the same subcommand (like 'write' or 'draw' that can help you generate multiple file each), you can provide multiple options to write all of the file formats that you desire at once.
When using the same subcommand (like 'write_pangenome' or 'draw' that can help you generate multiple file each), you can provide multiple options to write all of the file formats that you desire at once.

## PPanGGOLiN figures outputs

Expand All @@ -27,7 +27,10 @@ When using the same subcommand (like 'write' or 'draw' that can help you generat
```{include} Figures/rarefaction.md
```

## Write
## `write_pangeome`: Write flat output describing the pangenome

Writes 'flat' files that describe the pangenome and its elements.

### Organisms statistics
```{include} Flat/orgStat.md
```
Expand All @@ -39,7 +42,6 @@ The pangenome's graph can be given through multiple data formats, in order to ma
```{include} graphOut/GEXF.md
```


#### json
```{include} graphOut/JSON.md
```
Expand All @@ -51,14 +53,6 @@ The pangenome's graph can be given through multiple data formats, in order to ma
```{include} Flat/dupplication.md
```

### partitions
```{include} Flat/partition.md
```

### projection
```{include} Flat/projection.md
```

### Gene families and genes
```{include} Flat/fam2gen.md
```
Expand All @@ -71,6 +65,26 @@ The pangenome's graph can be given through multiple data formats, in order to ma
```{include} Flat/module.md
```

### partitions
```{include} Flat/partition.md
```

## `write_genomes`: Write genomes with pangenome annotations

Writes 'flat' files that represent the genomes along with their associated pangenome elements.



### tables
```{include} Flat/tables.md
```
### gff
```{include} Flat/gff.md
```
### proksee
```{include} Flat/proksee.md
```

## Fasta
```{include} sequence/fasta.md
```
Expand Down

0 comments on commit d365c2a

Please sign in to comment.