Skip to content

Latest commit

 

History

History
69 lines (43 loc) · 2.87 KB

output.md

File metadata and controls

69 lines (43 loc) · 2.87 KB

birneylab/varexplore: Output

Introduction

This document describes the output produced by the pipeline.

The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

  • Group reads - Group reads in specific regions according to the genotypes at the selected markers
  • Call variants - Joint call variants in the meta-samples for each vairant of interest
  • Predict variant effects - Predict variant effects using ENSEMBL VEP

Group reads

Group sample reads around a region of interest according to user-defined grouping criteria and the genotypes at a selected marker.

Output files
  • reads/group_<GROUP_ID>_variant_<VARIANT_ID>_gt_<GT>/
    • *.cram: sequencing reads in cram format
    • *.crai: cram file index

Call variants

Use GATK4 joint germline variant calling to detect variants in the grouped sequencing reads by group and genotype. The grouping by genotype allows to detect variant in linkage disequilibrium with the marker of interest.

Output files
  • variants/variant_<VARIANT_ID>/
    • *.vcf.gz: variant calls in all the meta-samples in vcf format
    • *.tbi: vcf file index

Predict variant effects

Use ENSEMBL VEP on the variant calls obtained in the previous step to determine variant consequence.

Output files
  • variants/variant_<VARIANT_ID>/
    • *.vep.tsv.gz: variant consequence predictions
    • *.mut.gz: variant consequences formatted in such a way that they can be directly loaded in IGV
    • *.vep.summary.html: html report from VEP

Pipeline information

Output files
  • pipeline_info/
    • Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot/pipeline_dag.svg.
    • Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.yml. The pipeline_report* files will only be present if the --email / --email_on_fail parameter's are used when running the pipeline.
    • Reformatted samplesheet files used as input to the pipeline: samplesheet.valid.csv.

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.