Skip to content

FASTQC output

sprokopec edited this page Jun 13, 2024 · 2 revisions

collect_fastqc_metrics.pl will run FASTQC and generate an md5 checksum for each fastq provided in fastq_config.yaml

module load perl

perl /path/to/collect_fastqc_metrics.pl \
-d /path/to/fastq_config.yaml \
-t /path/to/fastqc_tool_config.yaml \
-c slurm \
{optional: --rna, --dry-run }

Collated output:

Filename Total Sequences Sequences flagged as poor quality Sequence length %GC md5sum
TGL01_0001_Ov_P_EX_BAVYWVDTXY_1_R1.fastq.gz 56263351 0 126 43 bcd7a0241hc15292179a8657395181c5
TGL01_0001_Ov_P_EX_BAVYWVDTXY_1_R2.fastq.gz 56263351 0 126 44 62e60cd5313999ac907f425a2f7b7353
SMP-002-T1_EX_BAVYWVDTXY_1_R1.fastq.gz 16972293 0 126 47 bxxxxxxxxx00000
SMP-002-T1_EX_BAVYWVDTXY_1_R2.fastq.gz 16972293 0 126 46 a2b3c4d5xxxxxxxxx0021c
SMP-002-T2_EX_BAVYWVDTXY_1_R1.fastq.gz 17745387 0 126 52 yyyyyyyyyy11111
SMP-002-T2_EX_BAVYWVDTXY_1_R2.fastq.gz 17745387 0 126 51 ff29dc0312c2222262dc
SMP-002-N_EX_BAVYWVDTXY_1_R1.fastq.gz 12559524 0 126 50 zd2eeef31zz22222
SMP-002-N_EX_BAVYWVDTXY_1_R2.fastq.gz 12559524 0 126 50 z97c8881se21111

In particular, ensure read length is consistent, GC content is similar (typically between 40-60%) and files are unique (no duplicated md5sums).

Clone this wiki locally