Switch to a CSV/TSV based input #18

marchoeppner · 2018-04-26T10:19:29Z

For the sake of pulling in relevant meta data, I suggest to use CSV/TSV as default input format rather than a folder with a bunch of FastQ files.

Suggested format would be:

IndivID;SampleID;libraryID;rgID;rgPU;platform;platform_model;Center;Date;R1;R2

Peter;Germline;G00077-L2;HGJJMBBXX.3.G00077-L2;HGJJMBBXX.3.TCCTGAGC+ATAGAGAG;Illumina;NextSeq500;IKMB;2018-02-06;/ifs/data/nfs_share/sukmb352/projects/pipelines/exomes/trio/original_sequences/G00077-L2_S20_L003_R1_001.fastq.gz;/ifs/data/nfs_share/sukmb352/projects/pipelines/exomes/trio/original_sequences/G00077-L2_S20_L003_R2_001.fastq.gz

Peter;Tumor;G00078-L2;HGJJMBBXX.3.G00078-L2;HGJJMBBXX.3.GGACTCCT+ATAGAGAG;Illumina;NextSeq500;IKMB;2018-02-06;/ifs/data/nfs_share/sukmb352/projects/pipelines/exomes/trio/original_sequences/G00078-L2_S21_L003_R1_001.fastq.gz;/ifs/data/nfs_share/sukmb352/projects/pipelines/exomes/trio/original_sequences/G00078-L2_S21_L003_R2_001.fastq.gz

The text was updated successfully, but these errors were encountered:

ewels · 2018-04-26T12:25:26Z

Or a nextflow params file? nextflow-io/nextflow#208

CSV/TSV is nice and may be necessary here, but I'm also keen for nf-core pipelines to work with minimal input if possible. eg. Still working for someone who turns up with "I have a bunch of FastQ files and know nothing about them." If the pipeline fails because the user doesn't know the platform_model then that's not ideal.

Of course - that's not to say that it's not possible to have both, that would be ideal. Work with minimal requirements but also nice verbose well organised meta files.

marchoeppner · 2018-04-26T13:15:26Z

For these cases, we actually use this (pardon the crummy'ness of the code):

https://git.ikmb.uni-kiel.de/bfx-core/NF-diagnostics-exome/blob/master/bin/samplesheet_from_folder.rb

Builds a valid input CSV from a folder full of FastQs with actual values where extractable from the fastq files and place holders / best guesses for the other fields. This way you could at least nudge people towards better record keeping ;)

But two mutually exclusive input channels might also work.

maxulysse · 2018-04-26T13:23:34Z

We have a similar idea that we use for germline sample:
https://github.com/SciLifeLab/Sarek/blob/master/main.nf#L738-L766

ewels · 2018-04-26T15:36:03Z

Nice! I guess we could embed such a script into the workflow so that it works with a glob of FastQs or a CSV file..? That would be ideal.

marchoeppner · 2018-04-27T05:38:19Z

My vote goes to the "Sarek" approach; should be fairly straight-forward to just steal the code ;)

apeltzer · 2018-04-27T13:26:23Z

Same here

apeltzer added this to the ExoSeq V1.0 "Black Fox" milestone Aug 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to a CSV/TSV based input #18

Switch to a CSV/TSV based input #18

marchoeppner commented Apr 26, 2018 •

edited

Loading

ewels commented Apr 26, 2018

marchoeppner commented Apr 26, 2018

maxulysse commented Apr 26, 2018

ewels commented Apr 26, 2018

marchoeppner commented Apr 27, 2018

apeltzer commented Apr 27, 2018

Switch to a CSV/TSV based input #18

Switch to a CSV/TSV based input #18

Comments

marchoeppner commented Apr 26, 2018 • edited Loading

ewels commented Apr 26, 2018

marchoeppner commented Apr 26, 2018

maxulysse commented Apr 26, 2018

ewels commented Apr 26, 2018

marchoeppner commented Apr 27, 2018

apeltzer commented Apr 27, 2018

marchoeppner commented Apr 26, 2018 •

edited

Loading