Skip to content

Commit

Permalink
nanoplot (#95)
Browse files Browse the repository at this point in the history
* nanoplot

* test_data

* reinitiate

* gitignore

* namespace

* Testing NanoPlot in CLI

* NanoPlot complete

* Updated docker engine

* Docker

* Delete taget directory

* Deleted

* Input file

* fastq with more reads

* Delete config.vsh.yaml

* Pull request changes

* Delete var directory

* Config arguments complete

* Update help.txt

* Update config file

* Test files

* runners script

* gitignore default

* Move output

* Delete output directory

* Runners script complete

* Test script

* default output

* test data

* params passed correctly

* outdir

* test script

* input files

* all test files

* test data < 100 KB

* test script update

* Update CHANGELOG.md

* Update CHANGELOG.md

* Test cases in directories

* rm .gz .pickle .feather files

* reduce test input size

* Multiple separator ";" and check there is only one input file

---------

Co-authored-by: jakubmajercik <[email protected]>
Co-authored-by: Emma Rousseau <[email protected]>
  • Loading branch information
3 people authored Oct 26, 2024
1 parent 7fb67a9 commit 6e6b139
Show file tree
Hide file tree
Showing 13 changed files with 1,317 additions and 2 deletions.
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@

* `rsem/rsem_calculate_expression`: Calculate expression levels (PR #93).

* `nanoplot`: Plotting tool for long read sequencing data and alignments (PR #95).

## BREAKING CHANGES

* `falco`: Fix a typo in the `--reverse_complement` argument (PR #157).
Expand Down Expand Up @@ -189,8 +191,6 @@
- `bbmap_bbsplit`: Split sequencing reads by mapping them to multiple references simultaneously (PR #138).




## MINOR CHANGES

* Uniformize component metadata (PR #23).
Expand Down
230 changes: 230 additions & 0 deletions src/nanoplot/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
name: nanoplot
description: |
Run NanoPlot on nanopore-sequenced reads.
NanoPlot is a plotting tool for long read sequencing data and alignments.
keywords: ["fastq", "sequencing summary", "nanopore"]
links:
repository: https://github.com/wdecoster/NanoPlot
homepage: http://nanoplot.bioinf.be/
documentation: https://github.com/wdecoster/NanoPlot
references:
doi: 10.1093/bioinformatics/btad311
license: MIT
argument_groups:
- name: Inputs
arguments:
- name: --fastq
type: file
description: Input fastq file(s), separated by ";".
example: read.fq
direction: input
multiple: true
- name: --fasta
type: file
description: Input fasta file(s), separated by ";".
example: read.fa
direction: input
multiple: true
- name: --fastq_rich
type: file
description: |
Input fastq file(s) generated by albacore or
MinKNOW with additional information concerning channel and time, separated by ";".
example: read.fq
direction: input
multiple: true
- name: --fastq_minimal
type: file
description: |
Input fastq file(s) generated by albacore or MinKNOW with
additional information concerning channel and time. Minimal data is extracted
swiftly without elaborate checks. Separated by ";".
example: read.fq
direction: input
multiple: true
- name: --summary
type: file
description: |
Input summary file(s) generated by albacore or guppy, separated by ";".
example: read.txt
direction: input
multiple: true
- name: --bam
type: file
description: Input sorted bam file(s), separated by ";".
example: read.bam
direction: input
multiple: true
- name: --ubam
type: file
description: Input unmapped bam file(s), separated by ";".
example: read.ubam
direction: input
multiple: true
- name: --cram
type: file
description: Input sorted cram file(s), separated by ";".
example: read.cram
direction: input
multiple: true
- name: --pickle
type: file
description: Input pickle file stored earlier, separated by ";".
example: read.pkl
direction: input
multiple: true
- name: --feather
alternatives: [--arrow]
type: file
description: Input feather file(s), separated by ";".
example: read.arrow
direction: input
multiple: true
- name: Outputs
arguments:
- name: --outdir
alternatives: [-o]
type: file
direction: output
description: Specify directory in which output has to be created.
required: true
- name: Options
arguments:
- name: --verbose
type: boolean_true
description: Write log messages also to terminal
- name: --store
type: boolean_true
description: Store the extracted data in a pickle file for future plotting.
- name: --raw
type: boolean_true
description: Store the extracted data in tab separated file.
- name: --huge
type: boolean_true
description: Input data is one very large file.
- name: --no_static
type: boolean_false
description: Do not make static (png) plots.
- name: --prefix
alternatives: [-p]
type: string
description: Specify an optional prefix to be used for the output files.
- name: --tsv_stats
type: boolean_true
description: Output the stats file as a properly formatted TSV.
- name: --only_report
type: boolean_true
description: Output only the report.
- name: --info_in_report
type: boolean_true
description: Add NanoPlot run info in the report.
- name: Filtering or transforming input
arguments:
- name: --maxlength
type: integer
description: Drop reads longer than length specified.
- name: --minlength
type: integer
description: Drop reads shorter than length specified.
- name: --drop_outliers
type: boolean_false
description: Drop outlier reads with extreme long length.
- name: --downsample
type: integer
description: Reduce dataset to N reads by random sampling.
- name: --loglength
type: boolean_true
description: Logarithmic scaling of lengths in plots.
- name: --percentqual
type: boolean_true
description: Use qualities as theoretical percent identities.
- name: --alength
type: boolean_true
description: Use aligned read lengths rather than sequenced length (bam mode).
- name: --minqual
type: integer
description: Drop reads with an average quality lower than specified.
- name: --runtime_until
type: integer
description: Only take the N first hours of a run.
- name: --readtype
type: string
description: |
Which read type to extract information about from summary.
Options are 1D, 2D, 1D2
- name: --barcoded
type: boolean_true
description: Use if you want to split the summary file by barcode.
- name: --no_supplementary
type: boolean_false
description: Use if you want to remove supplementary alignments.
- name: Customizing plots
arguments:
- name: --color
alternatives: [-c]
type: string
description: Specify a color for the plots, must be a valid matplotlib color.
- name: --colormap
alternatives: [-cm]
type: string
description: Specify a valid matplotlib colormap for the heatmap.
- name: --format
alternatives: [-f]
type: string
default: png
description: |
Specify the output format of the plots.
{eps,jpeg,jpg,pdf,pgf,png,ps,raw,rgba,svg,svgz,tif,tiff}
- name: --plots
type: string
description: |
Specify which bivariate plots have to be made.
[{kde,hex,dot} ...]
- name: --legacy
type: string
description: |
Specify which bivariate plots have to be made (legacy mode).
[{kde,dot,hex} ...]
- name: --listcolors
type: boolean_true
description: List the colors which are available for plotting and exit.
- name: --listcolormaps
type: boolean_true
description: List the colormaps which are available for plotting and exit.
- name: --no_N50
type: boolean_false
description: Hide the N50 mark in the read length histogram.
- name: --N50
type: boolean_true
description: Show the N50 mark in the read length histogram.
- name: --title
type: string
description: Add a title to all plots, requires quoting if using spaces.
- name: --font_scale
type: double
description: Scale the font of the plots by a factor.
- name: --dpi
type: integer
description: Set the dpi for saving images.
- name: --hide_stats
type: boolean_false
description: Not adding Pearson R stats in some bivariate plots.
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
engines:
- type: docker
image: quay.io/biocontainers/nanoplot:1.43.0--pyhdfd78af_1
setup:
- type: docker
run: |
version=$(NanoPlot --version) && \
echo "$version" > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
96 changes: 96 additions & 0 deletions src/nanoplot/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
usage: NanoPlot [-h] [-v] [-t THREADS] [--verbose] [--store] [--raw] [--huge]
[-o OUTDIR] [--no_static] [-p PREFIX] [--tsv_stats]
[--only-report] [--info_in_report] [--maxlength N]
[--minlength N] [--drop_outliers] [--downsample N]
[--loglength] [--percentqual] [--alength] [--minqual N]
[--runtime_until N] [--readtype {1D,2D,1D2}] [--barcoded]
[--no_supplementary] [-c COLOR] [-cm COLORMAP]
[-f [{png,jpg,jpeg,webp,svg,pdf,eps,json} ...]]
[--plots [{kde,hex,dot} ...]] [--legacy [{kde,dot,hex} ...]]
[--listcolors] [--listcolormaps] [--no-N50] [--N50]
[--title TITLE] [--font_scale FONT_SCALE] [--dpi DPI]
[--hide_stats]
(--fastq file [file ...] | --fasta file [file ...] | --fastq_rich file [file ...] | --fastq_minimal file [file ...] | --summary file [file ...] | --bam file [file ...] | --ubam file [file ...] | --cram file [file ...] | --pickle pickle | --feather file [file ...])

CREATES VARIOUS PLOTS FOR LONG READ SEQUENCING DATA.

General options:
-h, --help show the help and exit
-v, --version Print version and exit.
-t, --threads THREADS
Set the allowed number of threads to be used by the script
--verbose Write log messages also to terminal.
--store Store the extracted data in a pickle file for future plotting.
--raw Store the extracted data in tab separated file.
--huge Input data is one very large file.
-o, --outdir OUTDIR Specify directory in which output has to be created.
--no_static Do not make static (png) plots.
-p, --prefix PREFIX Specify an optional prefix to be used for the output files.
--tsv_stats Output the stats file as a properly formatted TSV.
--only-report Output only the report
--info_in_report Add NanoPlot run info in the report.

Options for filtering or transforming input prior to plotting:
--maxlength N Hide reads longer than length specified.
--minlength N Hide reads shorter than length specified.
--drop_outliers Drop outlier reads with extreme long length.
--downsample N Reduce dataset to N reads by random sampling.
--loglength Additionally show logarithmic scaling of lengths in plots.
--percentqual Use qualities as theoretical percent identities.
--alength Use aligned read lengths rather than sequenced length (bam mode)
--minqual N Drop reads with an average quality lower than specified.
--runtime_until N Only take the N first hours of a run
--readtype {1D,2D,1D2}
Which read type to extract information about from summary. Options are 1D, 2D,
1D2
--barcoded Use if you want to split the summary file by barcode
--no_supplementary Use if you want to remove supplementary alignments

Options for customizing the plots created:
-c, --color COLOR Specify a valid matplotlib color for the plots
-cm, --colormap COLORMAP
Specify a valid matplotlib colormap for the heatmap
-f, --format [{png,jpg,jpeg,webp,svg,pdf,eps,json} ...]
Specify the output format of the plots, which are in addition to the html files
--plots [{kde,hex,dot} ...]
Specify which bivariate plots have to be made.
--legacy [{kde,dot,hex} ...]
Specify which bivariate plots have to be made (legacy mode).
--listcolors List the colors which are available for plotting and exit.
--listcolormaps List the colors which are available for plotting and exit.
--no-N50 Hide the N50 mark in the read length histogram
--N50 Show the N50 mark in the read length histogram
--title TITLE Add a title to all plots, requires quoting if using spaces
--font_scale FONT_SCALE
Scale the font of the plots by a factor
--dpi DPI Set the dpi for saving images
--hide_stats Not adding Pearson R stats in some bivariate plots

Input data sources, one of these is required.:
--fastq file [file ...]
Data is in one or more default fastq file(s).
--fasta file [file ...]
Data is in one or more fasta file(s).
--fastq_rich file [file ...]
Data is in one or more fastq file(s) generated by albacore, MinKNOW or guppy
with additional information concerning channel and time.
--fastq_minimal file [file ...]
Data is in one or more fastq file(s) generated by albacore, MinKNOW or guppy
with additional information concerning channel and time. Is extracted swiftly
without elaborate checks.
--summary file [file ...]
Data is in one or more summary file(s) generated by albacore or guppy.
--bam file [file ...]
Data is in one or more sorted bam file(s).
--ubam file [file ...]
Data is in one or more unmapped bam file(s).
--cram file [file ...]
Data is in one or more sorted cram file(s).
--pickle pickle Data is a pickle file stored earlier.
--feather, --arrow file [file ...]
Data is in one or more feather file(s).

EXAMPLES:
NanoPlot --summary sequencing_summary.txt --loglength -o summary-plots-log-transformed
NanoPlot -t 2 --fastq reads1.fastq.gz reads2.fastq.gz --maxlength 40000 --plots hex dot
NanoPlot --color yellow --bam alignment1.bam alignment2.bam alignment3.bam --downsample 10000
Loading

0 comments on commit 6e6b139

Please sign in to comment.