Skip to content
This repository has been archived by the owner on Nov 9, 2023. It is now read-only.

significant unexpected behavior in split_sequence_file_on_sample_ids.py #2209

Open
wdwvt1 opened this issue Dec 5, 2017 · 1 comment
Open

Comments

@wdwvt1
Copy link
Contributor

wdwvt1 commented Dec 5, 2017

I know QIIME1 support ends soon, but I wanted to record this information somewhere in case people still using it run in to this problem. This also seems like a reasonably serious unexpected behavior because it can result in serious downstream errors.

Using the split_sequence_file_on_sample_ids.py script, if you supply an input fasta file but set the option --file_type fastq, the script will write out per sample fastq files using alternating sequences in the fasta file as quality scores.

For example, if your input.fna file was

>test_sample_0 R0235092:155:000000000-A9A34:1:1101:18633:1000 3:N:0: orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0
ACTAAA
>test_sample_1 R0235092:155:000000000-A9A34:1:1101:15249:1000 3:N:0: orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0
CCCCC

and you ran

split_sequence_file_on_sample_ids.py -i input.fna --file_type 'fastq' -o out_test

you'd get out_test/test_sample_0.fastq looking like

@test_sample_0 R0235092:155:000000000-A9A34:1:1101:18633:1000 3:N:0: orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0
ACTAAA
+
CCCCCC

Notice that input sequence 2 has become the qual score for input sequence 1.

This is made worse by the fact that the uppercase letters {ACTG} are all valid quality scores in phred 33, so rather than getting an error with a downstream step, you will just have silently halved the number of sequences and put in totally misleading quality scores.

@antgonza
Copy link
Contributor

antgonza commented Dec 5, 2017

Thanks for reporting. Just out of curiosity, are you sure the problem is QIIME1 and not another library, like skbio? I'm a bit concern that the bug still exists somewhere else ...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants