-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cutadapt #7
cutadapt #7
Conversation
Some food for thought: Prefixes for FASTA files
But what if the input file is not a FASTA file? Then the
Infrastructure argumentCutadapt has a What is the output?Cutadapt allows to specify the output filenames of R1 and R2 separately. This is done using a form of variable substitution: Of course, downstream this will have to be managed. |
Alternatively, you could define two arguments: arguments:
- name: --adapter_file
type: file
- name: --adapter_string
type: string If both are defined, throw an error.
Makes sense!
will respond later ^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR! It's always funny to see GitHub not wanting to show our config.vsh.yaml
because it's too large :)
I requested some changes! 🙇
src/cutadapt/config.vsh.yaml
Outdated
- name: --output | ||
type: file | ||
description: | | ||
Write trimmed reads to this directory and name the files using {name}. | ||
FASTQ or FASTA format is chosen depending on input. | ||
Summary report is sent to standard output. | ||
default: output | ||
direction: output | ||
required: true | ||
must_exist: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split up the output to separate output files?
Step 8: Add arguments for output files
...
Note:
Preferably, these outputs should not be directores but files. For example, if a tool outputs a directory foo/ containing files foo/bar.txt and foo/baz.txt, there should be two output arguments --bar and --baz (as opposed to one output argument which outputs the whole foo/ directory).
Especially since cutadapt
also doesn't rely in an output directory,
${par_json:+--json "${par_output}/report.json"} \
--output "$par_output/{name}_R1_001.fastq" \
--paired-output "$par_output/{name}_R2_001.fastq" \
Would need to be changed to
${par_json:+--json "${par_json}"} \
--output "$par_output" \
${par_output_r2:+--paired-output "$par_output_r2"} \
or something to that extend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cutadapt implicitly creates a directory because it generates a number of output fastq files depending on the input data and the arguments provided. Hence the {name}
variable. I would like to avoid that users have to set those 'variable' output paths explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it would be cumbersome if the users would have to set the variable output paths explicitly. I now understand that cutadapt returns a variable number of outputs depending on how many adapters the user provides.
I'm still a bit on the fence because creating one big directory with all of the outputs makes it harder for downstream components to process the files.
Can we set something up with type: file, multiple: true
such that, by default, the user can use --output
to get all of the outputs in one directory, but if they provide paths with wildcards the fasta's / fastq's will be written to those paths instead?
- --output_dir: contains all of the files below
- --report_txt: report.txt
- --report_json: report.json
- --output: output_R1.fasta/q if paired, else output.fasta/q
- --output_r2: output_*_R2.fasta/q
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just remembered that type: file, multiple: true, direction: output
is currently broken and will be fixed in viash-io/viash#639. Will try to finish this PR ASAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really blocking I think because the bug appears in the VDSL3 module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming back to your suggestion, @rcannood:
I have a version in development that does what you suggest: take both an --output_dir
as well as an --output
argument. The first is just one directory, the second is multiple: true
. This allows me to, for instance, run this in the tests:
"$meta_executable" \
--output_dir out_test1 \
--output out_test1/1_001.fasta \
--output out_test1/unknown_001.fasta \
...
Or, alternatively --output out_test1/1_001.fasta;out_test1/unknown_001.fasta
.
I agree it's nice to be able to capture the output explicitly rather than a directory, but it requires one to know the result of the cutadapt
step prior to running it which will depend on the case at hand.
Ideally though, I would prefer to do something like:
"$meta_executable" \
--output_dir out_test1 \
--output out_test1/*.fast? \
...
This, however, does not work because of the way the executable is built (set -f
disallows globbing):
# check whether required files exist
if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
IFS=';'
set -f
for file in $VIASH_PAR_OUTPUT; do
unset IFS
if [ ! -e "$file" ]; then
ViashError "Output file '$file' does not exist."
exit 1
fi
done
set +f
fi
BTW: The way we deal with this in a Nextflow pipeline is by adding a map
and glob inside the file()
function of Nextflow.
Now, how to proceed?
- Look into adapting the behaviour of Viash? (
set -f
) - Stick to what works with the individual output as illustrated above?
- Just output the directory name and omit the individual output files?
- (1) and (2)?
Co-authored-by: Robrecht Cannoodt <[email protected]>
Co-authored-by: Robrecht Cannoodt <[email protected]>
Co-authored-by: Robrecht Cannoodt <[email protected]>
…unpaired output arguments
Propose changes to cutadapt
Currently waiting for the next version of Viash to be released so we can get |
Viash was updated to 0.9.0-rc3 in #51, so we can finally update this PR 😁 |
Update: The We could introduce WDYT @rcannood ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor change requested, otherwise LGTM
Co-authored-by: Robrecht Cannoodt <[email protected]>
It turns out |
* First commit, clone of cutadapt in htrnaseq + help.txt * Add config * Don't allow multiple: true when providing a FASTA file with adapters * First version of script * Updates and fixes - se/pe * Add tests and fix --json argument * Add software version * Better consistency in using snake_case * Update src/cutadapt/config.vsh.yaml Co-authored-by: Robrecht Cannoodt <[email protected]> * Update src/cutadapt/config.vsh.yaml Co-authored-by: Robrecht Cannoodt <[email protected]> * Update src/cutadapt/config.vsh.yaml Co-authored-by: Robrecht Cannoodt <[email protected]> * Specify --input and --input_r2 as separate arguments * Avoid specifying default arg values * Add more information to `--minimum_length` and `maximum_length` * Add --cpus by means of $meta_cpus and set proper default * Allow multiple for adapters/fasta and add test * change multiple_sep to ';' * add example * simplify code with a helper function * create directories in test * use a different output extension if --fasta is provided * decrease code duplication by separating optional outputs from paired/unpaired output arguments * write custom tests for cutadapt * fix _r2 arguments * add debug flag as not to always print the cli command * remove comment * Update to Viash 0.9.0-RC4 * Ability to specify output globbing patterns * Avoid the need for both output_dir and output * Move fields from `info` to `links` Co-authored-by: Robrecht Cannoodt <[email protected]> * Move references back to the info field * apologies, I proposed a wrong syntax --------- Co-authored-by: Robrecht Cannoodt <[email protected]>
* first version * complete script for qualimap * add escaping character before leading hashtag (#50) * add escaping character before leading hashtag * update changelog * Update CHANGELOG.md Co-authored-by: Robrecht Cannoodt <[email protected]> * replace escaping \ by \\ --------- Co-authored-by: Robrecht Cannoodt <[email protected]> * Samtools collate (#49) * initial commit dedup * Revert "initial commit dedup" This reverts commit 38f586b. * Initial commit, whole component is functional * Update viash (#51) * update viash * update readme * update changelog * update changelog * fix incorrect heading detection * update again * clean up readme * Samtools view (#48) * initial commit dedup * Revert "initial commit dedup" This reverts commit 38f586b. * initial version with a few tests, script, and config file * update changelog, add one test * add a 4th test, fix option names in the script * Fix name of component in config * remove option named with a number * add must_exist to input file argument * removed "default: null" from one of the arguments in config * remove utf8 characters from config * Update CHANGELOG.md --------- Co-authored-by: Robrecht Cannoodt <[email protected]> * Samtools fastq (#52) * initial commit dedup * Revert "initial commit dedup" This reverts commit 38f586b. * Initial commit, config, script, help and test_data * Update changelog, add tests, fix argument naming errors, add test data * update changelog, remove gffread namespace field --------- Co-authored-by: Robrecht Cannoodt <[email protected]> * format URL in the description (#55) * format URL in the description * update changelog * Change name in _viash.yaml (#60) * Update operational code (#63) * update readme * switch ci to toolbox * update to viash 0.9.0-RC6 * edit keywords * fix version * update biobox * cutadapt (#7) * First commit, clone of cutadapt in htrnaseq + help.txt * Add config * Don't allow multiple: true when providing a FASTA file with adapters * First version of script * Updates and fixes - se/pe * Add tests and fix --json argument * Add software version * Better consistency in using snake_case * Update src/cutadapt/config.vsh.yaml Co-authored-by: Robrecht Cannoodt <[email protected]> * Update src/cutadapt/config.vsh.yaml Co-authored-by: Robrecht Cannoodt <[email protected]> * Update src/cutadapt/config.vsh.yaml Co-authored-by: Robrecht Cannoodt <[email protected]> * Specify --input and --input_r2 as separate arguments * Avoid specifying default arg values * Add more information to `--minimum_length` and `maximum_length` * Add --cpus by means of $meta_cpus and set proper default * Allow multiple for adapters/fasta and add test * change multiple_sep to ';' * add example * simplify code with a helper function * create directories in test * use a different output extension if --fasta is provided * decrease code duplication by separating optional outputs from paired/unpaired output arguments * write custom tests for cutadapt * fix _r2 arguments * add debug flag as not to always print the cli command * remove comment * Update to Viash 0.9.0-RC4 * Ability to specify output globbing patterns * Avoid the need for both output_dir and output * Move fields from `info` to `links` Co-authored-by: Robrecht Cannoodt <[email protected]> * Move references back to the info field * apologies, I proposed a wrong syntax --------- Co-authored-by: Robrecht Cannoodt <[email protected]> * update changelog * update readme * Update salmon quant arguments (#57) * Make index an optional argument * FIx argument type and add optional argument * FEAT: add bedtools getfasta. (#59) * FEAT: add bedtools getfasta. * Add PR number to CHANGELOG * Add star genomegenerate component (#58) * Add star genomegenerate component * Update changelog * Rename component * Update test * Update CHANGELOG.md --------- Co-authored-by: Robrecht Cannoodt <[email protected]> * fix package config (#65) * Delete src/bgzip directory (#64) It was moved to toolbox * Output alignments to the transcriptome (#56) * Output alignments to the transcriptome * Change argument name * BUG: pear component failure is ignored (#70) * FEAT + BUG: cutadapt; allowing disabling demultiplexing and fix par_quality_cutoff_r2 (#69) * FEAT: Disable cutadapt demultiplexing by default * Cutadapt: fix --par_quality_cutoff_r2 * FEAT: update busco to 5.7.1 (#72) * FEAT: update busco to 5.7.1 * Typo * Samtools fasta (#53) * initial commit dedup * Revert "initial commit dedup" This reverts commit 38f586b. * Fasta component * change script resource to samtools_fastq script, with dummy argument to specify the command * add dummy argument to samtools_fastq to share the script with samtools_fasta * fix path to script in config * Update src/samtools/samtools_fastq/script.sh Co-authored-by: Robrecht Cannoodt <[email protected]> * Change default fields to examples * Two more default fields changed to examples * Minor formatting changes * Markdown formatting changes in configs --------- Co-authored-by: Robrecht Cannoodt <[email protected]> * Umi tools dedup (#54) * initial commit dedup * Revert "initial commit dedup" This reverts commit 38f586b. * inital commit dedup * Working component with one test * Update test 1 and test data, fix some arg types in config and script * test data files and changes to script * Add third test and test data * Fix typo in script * remove utf8 characters in config * Add choices fields and change default fields to exampels * Minor formatting changes * md formatting changes in config * Fix typo (#79) * add vscode to gitignore * update multiple separator (#81) * update multiple separator * update changelog * Update src/multiqc/config.vsh.yaml Co-authored-by: Robrecht Cannoodt <[email protected]> * Update src/multiqc/config.vsh.yaml Co-authored-by: Robrecht Cannoodt <[email protected]> * Update src/multiqc/config.vsh.yaml Co-authored-by: Robrecht Cannoodt <[email protected]> * Update src/multiqc/config.vsh.yaml Co-authored-by: Robrecht Cannoodt <[email protected]> * update ifs --------- Co-authored-by: Robrecht Cannoodt <[email protected]> * add test data * add tests * update changelog * remove unrequired test data * update descriptions * update changelog * update help text * Update src/qualimap/qualimap_rnaseq/script.sh Co-authored-by: Robrecht Cannoodt <[email protected]> * update unit tests * update unit tests * addres pr changes request * add version * remove whitespace multiqc * Apply suggestions from code review Co-authored-by: Robrecht Cannoodt <[email protected]> * address pr comments * Update CHANGELOG.md * fix doi * Fix name * update version and container image * write software version to file --------- Co-authored-by: dorien-er <[email protected]> Co-authored-by: Leila011 <[email protected]> Co-authored-by: Robrecht Cannoodt <[email protected]> Co-authored-by: emmarousseau <[email protected]> Co-authored-by: Sai Nirmayi Yasa <[email protected]> Co-authored-by: Dries Schaumont <[email protected]> Co-authored-by: Dorien <[email protected]>
Description
Add cutadapt to the repo.