Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

two fastq files were not correctly formated #36

Open
alexyfyf opened this issue May 12, 2023 · 3 comments
Open

two fastq files were not correctly formated #36

alexyfyf opened this issue May 12, 2023 · 3 comments

Comments

@alexyfyf
Copy link

Hi team,

I have downloaded some cDNA fastq files from you s3 repo.
I found 2 files are not correctly formatted when I run QC with NanoPlot.

SGNex_MCF7_cDNAStranded_replicate2_run1/SGNex_MCF7_cDNAStranded_replicate2_run1.fastq.gz
SGNex_K562_cDNAStranded_replicate3_run3/SGNex_K562_cDNAStranded_replicate3_run3.fastq.gz

The first one has additional strings before the @ character of the first read.

fastq_fail/FAK34234_679ea2e77287c6ea3bab84c69ca16d29e5d9c760_228.fastq000666 001750 001750 00010735421 13424777162 023424 0ustar00gridgrid000000 000000 @0185f0c7-c4a5-40fb-9ac2-6907653a86a5 runid=679ea2e77287c6ea3bab84c69ca16d29e5d9c760 read=46243 ch=61 start_time=2019-02-01T08:06:48Z flow_cell_id=FAK34234 protocol_group_id=010219_MCF7_mRNA_PCS109 sample_id=010219_MCF7_mRNA_PCS109
ACGGTAATACTTCGGTCTTGTTTCGACAATCGGTCGCTCAGACCGACCGTGGAAC
+
#"*%&$#%"$&"""""$&&#"""""""++*++)/+%#%##'+*$%&'%"##("&$

The second one has a read with an unmatching length of quality score.

@09f55d50-803e-4048-899d-bb2fbdbf9c33 runid=446e90283984afd70d3f9af90262644290c7fca2 read=1796 ch=64 start_time=2019-01-07T07:56:26Z flow_cell_id=FAK11042 protocol_group_id=070119_K562_mRNA_PCS109 sample_id=070119_K562_mRNA_PCS109
TCGGTGATAAAGTGTTAATCGTCGG
+
%"-$&%""""""""$"""""""""

Can you confirm this?
Cheers,
Alex

@cying111
Copy link
Collaborator

cying111 commented Nov 8, 2023

Hi @alexyfyf ,

Thanks for pointing out the problems of those files.

I have corrected those two files and updated them in the S3 bucket. Please have a look.

Please let us know if issues are found for other files as well!

Thank you.
Warm regards,
Ying

@alexyfyf
Copy link
Author

alexyfyf commented Nov 9, 2023

Hi Ying,

I did spot another file from dRNA also corruputed.
SGNex_MCF7_directRNA_replicate2_run2

It has quite a few problems, and I used the following code to fix it.

zcat SGNex_MCF7_directRNA_replicate2_run2.fastq.gz | sed 's/.*@/@/g' | sed '$d' | gzip > SGNex_MCF7_directRNA_replicate2_run2_fixed.fastq.gz

You can have a look and see if there's a better way.

Cheers,
Alex

@cying111
Copy link
Collaborator

cying111 commented Nov 9, 2023

Hi Alex,

Thanks for the heads-up again and sharing your code for correcting that.

I think that's good already.

I have uploaded the corrected version just now.

Thank you
Regards,
Ying

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants