Skip to content
This repository has been archived by the owner on May 7, 2019. It is now read-only.

Switch to a CSV/TSV based input #18

Open
marchoeppner opened this issue Apr 26, 2018 · 6 comments
Open

Switch to a CSV/TSV based input #18

marchoeppner opened this issue Apr 26, 2018 · 6 comments

Comments

@marchoeppner
Copy link

marchoeppner commented Apr 26, 2018

For the sake of pulling in relevant meta data, I suggest to use CSV/TSV as default input format rather than a folder with a bunch of FastQ files.

Suggested format would be:

IndivID;SampleID;libraryID;rgID;rgPU;platform;platform_model;Center;Date;R1;R2

Peter;Germline;G00077-L2;HGJJMBBXX.3.G00077-L2;HGJJMBBXX.3.TCCTGAGC+ATAGAGAG;Illumina;NextSeq500;IKMB;2018-02-06;/ifs/data/nfs_share/sukmb352/projects/pipelines/exomes/trio/original_sequences/G00077-L2_S20_L003_R1_001.fastq.gz;/ifs/data/nfs_share/sukmb352/projects/pipelines/exomes/trio/original_sequences/G00077-L2_S20_L003_R2_001.fastq.gz

Peter;Tumor;G00078-L2;HGJJMBBXX.3.G00078-L2;HGJJMBBXX.3.GGACTCCT+ATAGAGAG;Illumina;NextSeq500;IKMB;2018-02-06;/ifs/data/nfs_share/sukmb352/projects/pipelines/exomes/trio/original_sequences/G00078-L2_S21_L003_R1_001.fastq.gz;/ifs/data/nfs_share/sukmb352/projects/pipelines/exomes/trio/original_sequences/G00078-L2_S21_L003_R2_001.fastq.gz
@ewels
Copy link
Member

ewels commented Apr 26, 2018

Or a nextflow params file? nextflow-io/nextflow#208

CSV/TSV is nice and may be necessary here, but I'm also keen for nf-core pipelines to work with minimal input if possible. eg. Still working for someone who turns up with "I have a bunch of FastQ files and know nothing about them." If the pipeline fails because the user doesn't know the platform_model then that's not ideal.

Of course - that's not to say that it's not possible to have both, that would be ideal. Work with minimal requirements but also nice verbose well organised meta files.

@marchoeppner
Copy link
Author

For these cases, we actually use this (pardon the crummy'ness of the code):

https://git.ikmb.uni-kiel.de/bfx-core/NF-diagnostics-exome/blob/master/bin/samplesheet_from_folder.rb

Builds a valid input CSV from a folder full of FastQs with actual values where extractable from the fastq files and place holders / best guesses for the other fields. This way you could at least nudge people towards better record keeping ;)

But two mutually exclusive input channels might also work.

@maxulysse
Copy link
Member

We have a similar idea that we use for germline sample:
https://github.com/SciLifeLab/Sarek/blob/master/main.nf#L738-L766

@ewels
Copy link
Member

ewels commented Apr 26, 2018

Nice! I guess we could embed such a script into the workflow so that it works with a glob of FastQs or a CSV file..? That would be ideal.

@marchoeppner
Copy link
Author

My vote goes to the "Sarek" approach; should be fairly straight-forward to just steal the code ;)

@apeltzer
Copy link
Member

Same here

@apeltzer apeltzer added this to the ExoSeq V1.0 "Black Fox" milestone Aug 15, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants