Skip to content

Commit

Permalink
Annotate transcripts (#29)
Browse files Browse the repository at this point in the history
* Add missing configuration parameters

* Edger also outputs design

* New filter for DE significant results

* New filter for DE significant results

* Configuration is copied to the report directory after successful run

* Git versions of source codes are stored in the report directory

* Version of Snakelines is stored in the report directory

* Removed obsolete source code - git version no longer needed since we have snakelines version

* Update snakelines.snake

* Renamed path to match refactored report structures

* Store design even when batch is not defined

* Fixed minor bug - the first sample was not visualised in the PCA plot

* Refactoring, name FCexp-1  in output fiels changed to fold_change

* HTML table with summarized results of differential expression analysis

* Example files for annotation

* DE transcripts can be annotated by external attributes

* Example report has links to gene ontology terms

* Export GO terms of DE genes to Revigo format

* Documented Revigo, also it is stored as .txt file in the report directory for simpler copy-paste to the Revigo website
  • Loading branch information
jbudis authored and wernerkrampl committed Jul 10, 2019
1 parent be4d21b commit 3422dbe
Show file tree
Hide file tree
Showing 9 changed files with 324 additions and 14 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Take transcriptomic counts from several samples and merge them together into a s
- *table:* TSV table with statistical evaluation of change in expression
- *desc:* Description of reference sequences
- *template:* HTML template with basic report outline
- *annotations:* TSV files with attributes for annotated transcripts

**Output(s):**

Expand Down
16 changes: 16 additions & 0 deletions docs/rules/classification/report/transcripts/revigo/summary.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Custom - Export De Genes For Revigo
---------------------------------------

Export differentially expressed genes and their GO annotations in format suitable for visualization
in the Revigo webpage (http://revigo.irb.hr/)

**Location**

- *Filepath:* <SnakeLines_dir>/rules/classification/report/transcripts/revigo/custom.snake
- *Rule name:* custom__export_de_genes_for_revigo

**Input(s):**

- *des:* TSV table with DE genes
- *gos:* Annotation of genes - TSV file with columns id and external_id

5 changes: 5 additions & 0 deletions docs/rules/outline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,11 @@ Rules
.. toctree::
classification/report/transcripts/pca/summary.rst

- Revigo

.. toctree::
classification/report/transcripts/revigo/summary.rst

- Count Table

.. toctree::
Expand Down
26 changes: 17 additions & 9 deletions example/rnaseq/config_transcriptomics.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,17 +42,25 @@ classification: # Identify genomic source of sequenced reads
min_fold_change: 1.5 # Minimal value of fold change for transcript to be reported
reproducible_expression: True # At least one read must be mapped to transcript in all samples from over-expressed group to be reported

filter_significant: # Filter transcripts with significant change in expression
method: custom # Supported values: custom
max_fdr: 0.05 # Maximal value of fold discovery change for transcript to be reported
min_fold_change: 1.5 # Minimal value of fold change for transcript to be reported

report:
transcripts:
count_table: # Summary table with number of reads per transcript
method: custom # Supported values: custom
html_table: # Summary HTML table with results of differential expressions
method: custom # Supported values: custom

count_table: # Summary table with number of reads per transcript
method: custom # Supported values: custom

html_table: # Summary HTML table with results of differential expressions
method: custom # Supported values: custom
annotation: # Annotate with attributes from external sources
- source: pombase # Annotate with attributes from reference/{reference}/annotation.transcripts/pombas/attributes.tsv
attributes: # Annotate with listed attributes only
- link # Annotate with attribute link
- source: go # Annotate with attributes from reference/{reference}/annotation.transcripts/go/attributes.tsv
# Annotate with all attributes since explicit attributes are not defined

revigo: # GO annotation terms in format suitable for visualisation on the ReviGO website (http://revigo.irb.hr/)
method: custom # Supported values: custom


pca:
method: sklearn # Supported values: sklearn
formats: # Output format of the resulting images
Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
id gene_id link
SPCC553.09c.1 SPCC553.09c <a href="https://www.pombase.org/gene/SPCC553.09c">SPCC553.09c</a>
SPCC553.08c.1 SPCC553.08c <a href="https://www.pombase.org/gene/SPCC553.08c">SPCC553.08c</a>
SPCC553.07c.1 SPCC553.07c <a href="https://www.pombase.org/gene/SPCC553.07c">SPCC553.07c</a>
SPCC553.06.1 SPCC553.06 <a href="https://www.pombase.org/gene/SPCC553.06">SPCC553.06</a>
SPCC553.05c.1 SPCC553.05c <a href="https://www.pombase.org/gene/SPCC553.05c">SPCC553.05c</a>
SPCC553.04.1 SPCC553.04 <a href="https://www.pombase.org/gene/SPCC553.04">SPCC553.04</a>
SPCC553.03.1 SPCC553.03 <a href="https://www.pombase.org/gene/SPCC553.03">SPCC553.03</a>
SPCC553.02.1 SPCC553.02 <a href="https://www.pombase.org/gene/SPCC553.02">SPCC553.02</a>
SPCC553.01c.1 SPCC553.01c <a href="https://www.pombase.org/gene/SPCC553.01c">SPCC553.01c</a>
SPCC736.02.1 SPCC736.02 <a href="https://www.pombase.org/gene/SPCC736.02">SPCC736.02</a>
SPCC736.03c.1 SPCC736.03c <a href="https://www.pombase.org/gene/SPCC736.03c">SPCC736.03c</a>
SPCC736.04c.1 SPCC736.04c <a href="https://www.pombase.org/gene/SPCC736.04c">SPCC736.04c</a>
SPCC736.05.1 SPCC736.05 <a href="https://www.pombase.org/gene/SPCC736.05">SPCC736.05</a>
SPCC736.06.1 SPCC736.06 <a href="https://www.pombase.org/gene/SPCC736.06">SPCC736.06</a>
SPCC736.07c.1 SPCC736.07c <a href="https://www.pombase.org/gene/SPCC736.07c">SPCC736.07c</a>
SPCC736.08.1 SPCC736.08 <a href="https://www.pombase.org/gene/SPCC736.08">SPCC736.08</a>
SPCC736.09c.1 SPCC736.09c <a href="https://www.pombase.org/gene/SPCC736.09c">SPCC736.09c</a>
SPCC736.10c.1 SPCC736.10c <a href="https://www.pombase.org/gene/SPCC736.10c">SPCC736.10c</a>
SPCC736.11.1 SPCC736.11 <a href="https://www.pombase.org/gene/SPCC736.11">SPCC736.11</a>
SPCC736.12c.1 SPCC736.12c <a href="https://www.pombase.org/gene/SPCC736.12c">SPCC736.12c</a>
SPCC736.13.1 SPCC736.13 <a href="https://www.pombase.org/gene/SPCC736.13">SPCC736.13</a>
SPCC736.14.1 SPCC736.14 <a href="https://www.pombase.org/gene/SPCC736.14">SPCC736.14</a>
28 changes: 24 additions & 4 deletions rules/classification/report/transcripts/html_table/custom.snake
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,24 @@ import pandas as pd
pd.set_option('display.max_colwidth', 1000000)
pd.options.display.float_format = '{:e}'.format

annotations = {annotation['source']: annotation.get('attributes', []) for annotation in method_config.get('annotation', [])}

rule custom__visualise_transcriptomic_counts_in_html_table:
"""
Take transcriptomic counts from several samples and merge them together into a single table.
:input table: TSV table with statistical evaluation of change in expression
:input desc: Description of reference sequences
:input template: HTML template with basic report outline
:input annotations: TSV files with attributes for annotated transcripts
:output html: HTML page with sortable, filterable table of transcriptomic results
"""
input:
table = 'classification/{reference}/report/comparison/differential_analysis.tsv',
descs = 'reference/{reference}/{reference}.transcripts.desc',
template = srcdir('templates/expressions.html'),
table = 'classification/{reference}/report/comparison/differential_analysis.tsv',
descs = 'reference/{reference}/{reference}.transcripts.desc',
template = srcdir('templates/expressions.html'),
annotations = expand('reference/{{reference}}/annotation.transcripts/{source}/attributes.tsv', source=annotations.keys())
output:
html = 'classification/{reference}/report/comparison/differential_analysis.html'
html = 'classification/{reference}/report/comparison/differential_analysis.html'
run:
def ncbi_link(ncbi_id):
return '<a href="https://www.ncbi.nlm.nih.gov/nuccore/{ncbi_id}">{ncbi_id}</a>'.format(ncbi_id=ncbi_id)
Expand All @@ -31,6 +35,22 @@ rule custom__visualise_transcriptomic_counts_in_html_table:
reported = counts[['id', 'fold_change', 'up_down', 'FDR', 'description']]
reported['fold_change'] = reported.fold_change.apply(lambda x: '{:.2f}'.format(x))

for annotation in input.annotations:
source_name = annotation.split('/')[3]
attributes = pd.read_csv(annotation, sep='\t', index_col=None)

if annotations[source_name]:
attributes = attributes[['id'] + annotations[source_name]]

for attribute_name in attributes.columns:
if attribute_name == 'id':
continue

stored_attribute = '{}.{}'.format(source_name, attribute_name)
print(attributes.columns)
attribute = attributes.groupby('id')[attribute_name].apply(lambda x: '<br />'.join(x))
reported[stored_attribute] = attribute

TEMPLATE = open(input.template).read()
with open(output.html, 'w') as out:
html_table = reported \
Expand Down
22 changes: 22 additions & 0 deletions rules/classification/report/transcripts/revigo/custom.snake
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import pandas as pd
pd.set_option('display.max_colwidth', 1000000)
pd.options.display.float_format = '{:e}'.format

rule custom__export_de_genes_for_revigo:
"""
Export differentially expressed genes and their GO annotations in format suitable for visualization
in the Revigo webpage (http://revigo.irb.hr/)
:input des: TSV table with DE genes
:input gos: Annotation of genes - TSV file with columns id and external_id
"""
input:
des = 'classification/{reference}/report/comparison/significant.tsv',
gos = 'reference/{reference}/annotation.transcripts/go/attributes.tsv'
output:
revigo = 'classification/{reference}/report/comparison/significant.revigo.tsv'
run:
des = pd.read_csv(input.des, sep='\t', index_col='Row.names')
gos = pd.read_csv(input.gos, sep='\t', index_col=0)
merged = des.merge(gos, left_index=True, right_index=True, how='inner')
merged[['external_id', 'PValue']].to_csv(output.revigo, sep='\t', index=None, header=None)

13 changes: 12 additions & 1 deletion src/dependency.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -360,7 +360,18 @@ classification:
from: expand('classification/{reference}/report/comparison/differential_analysis.html',
reference=pipeline.references)

to: expand('{report_dir}/_summary/differential_analysis/summary.html',
to: expand('{report_dir}/_summary/differential_analysis/summary.html',
report_dir=config['report_dir'])
depends:
- classification/differential_analysis

revigo:
output:
revigo_format:
from: expand('classification/{reference}/report/comparison/significant.revigo.tsv',
reference=pipeline.references)

to: expand('{report_dir}/_summary/differential_analysis/revigo.txt',
report_dir=config['report_dir'])
depends:
- classification/differential_analysis
Expand Down

0 comments on commit 3422dbe

Please sign in to comment.