-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outcome of O2/O9 comparison can be random #46
Comments
Sorry for the late response. Could you please share the accession numbers of your questionable genomes. We will look into this issue. |
Hi, no problem. The issue occurs with FASTQ input of two of our own samples, in a custom Docker environment. I can share the anonymized samples and the Docker image over a filesharing platform. Do you have a preference for which platform? |
Hi, a Docker image would be fine. Just let me know how to access your samples. Thanks! |
Hi,
The image is here: https://git.wur.nl/overb015/public/container_registry/872
The data is here:
https://filesender.surf.nl/?s=download&token=01f77bbd-497d-451a-84a2-227b76cc1d37
The command we use is:
"""bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/SPAdes-3.9.0-Linux/bin:/SalmID-0.11:/sratoolkit.2.8.0-ubuntu64/bin
SeqSero2_package.py \ -p 1 \ -t 2 \
-d "${samplename}" \ -i "${forward}" "${reverse}"
"""
Please let me know if you need any help.
Kind regards,
Lennert
…On Fri, Jul 22, 2022 at 3:17 PM LSTUGA ***@***.***> wrote:
Hi, a Docker image would be fine. Just let me know how to access your
samples. Thanks!
—
Reply to this email directly, view it on GitHub
<#46 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHUA2TPFCKO6AEDKEIRV2ILVVKNOTANCNFSM5SSPZ6NA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
One more question, does SeqSero2 really take absolutely raw and completely untrimmed reads as input? We have seen some improvement using completely raw input files. |
In some of our WGS datasets, SeqSero2 (
v1.1.1
using the default microassembly approach) returns O2 or O9 seemingly at random:We have traced the issue back to this decision block: https://github.com/denglab/SeqSero2/blob/master/bin/SeqSero2_package.py#L740.
The contigs stored in the
special_genes
dictionary are 3 relatively short contigs for O2, and 3 for O9. The contigs and the scores are identical in each run. However, when the scores for O2 and O9 hits are being compared on line 745 (see https://github.com/denglab/SeqSero2/blob/master/bin/SeqSero2_package.py#L745), only the last value of O2 is compared to the last value of O9. However, due to the random nature of iterating over dictionaries in Python3, the 'last' value in thespecial_genes
dictionary is not always the highest.special_genes
dictionary is random (as is known for some/most versions of Python3).In the case of multiple short contigs for the O gene, that match both O2 and O9, this means that a higher scoring contig can be passed first in the iteration, followed by a lower scoring contig. The second score then replaces the first, even though the first score was higher. In a different SeqSero2 run, the lower score might be passed first in the iteration, followed by the higher score, leading to a different pair of scores to be compared on line 745 in the script.
I propose the following solution:
if "tyr-O-9" in z:
) with:if "tyr-O-9" in z and special_genes[z] > O9:
elif "tyr-O-2" in z:
) with:if "tyr-O-2" in z and special_genes[z] > O2:
The text was updated successfully, but these errors were encountered: