Outcome of O2/O9 comparison can be random #46

VanOverbeeke · 2022-04-05T11:52:02Z

In some of our WGS datasets, SeqSero2 (v1.1.1 using the default microassembly approach) returns O2 or O9 seemingly at random:

Sample	Predicted antigenic profile	Predicted serotype
replicate_a_1.trimmed.fastq.gz	2:l,z28:1,5	I 2:l,z28:1,5
replicate_b_1.trimmed.fastq.gz	9:l,z28:1,5	Javiana
replicate_c_1.trimmed.fastq.gz	2:l,z28:1,5	I 2:l,z28:1,5
replicate_d_1.trimmed.fastq.gz	9:l,z28:1,5	Javiana
replicate_e_1.trimmed.fastq.gz	9:l,z28:1,5	Javiana
replicate_f_1.trimmed.fastq.gz	2:l,z28:1,5	I 2:l,z28:1,5
replicate_g_1.trimmed.fastq.gz	2:l,z28:1,5	I 2:l,z28:1,5
replicate_h_1.trimmed.fastq.gz	9:l,z28:1,5	Javiana

We have traced the issue back to this decision block: https://github.com/denglab/SeqSero2/blob/master/bin/SeqSero2_package.py#L740.

The contigs stored in the special_genes dictionary are 3 relatively short contigs for O2, and 3 for O9. The contigs and the scores are identical in each run. However, when the scores for O2 and O9 hits are being compared on line 745 (see https://github.com/denglab/SeqSero2/blob/master/bin/SeqSero2_package.py#L745), only the last value of O2 is compared to the last value of O9. However, due to the random nature of iterating over dictionaries in Python3, the 'last' value in the special_genes dictionary is not always the highest. special_genes dictionary is random (as is known for some/most versions of Python3).

In the case of multiple short contigs for the O gene, that match both O2 and O9, this means that a higher scoring contig can be passed first in the iteration, followed by a lower scoring contig. The second score then replaces the first, even though the first score was higher. In a different SeqSero2 run, the lower score might be passed first in the iteration, followed by the higher score, leading to a different pair of scores to be compared on line 745 in the script.

I propose the following solution:

replace line 741 (if "tyr-O-9" in z:) with: if "tyr-O-9" in z and special_genes[z] > O9:
replace line 743 (elif "tyr-O-2" in z:) with: if "tyr-O-2" in z and special_genes[z] > O2:

The text was updated successfully, but these errors were encountered:

LSTUGA · 2022-07-02T13:55:10Z

Sorry for the late response. Could you please share the accession numbers of your questionable genomes. We will look into this issue.

VanOverbeeke · 2022-07-18T13:16:48Z

Hi, no problem. The issue occurs with FASTQ input of two of our own samples, in a custom Docker environment. I can share the anonymized samples and the Docker image over a filesharing platform. Do you have a preference for which platform?

LSTUGA · 2022-07-22T13:17:19Z

Hi, a Docker image would be fine. Just let me know how to access your samples. Thanks!

VanOverbeeke · 2022-07-28T11:42:17Z

Hi, The image is here: https://git.wur.nl/overb015/public/container_registry/872 The data is here: https://filesender.surf.nl/?s=download&token=01f77bbd-497d-451a-84a2-227b76cc1d37 The command we use is: """bash PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/SPAdes-3.9.0-Linux/bin:/SalmID-0.11:/sratoolkit.2.8.0-ubuntu64/bin SeqSero2_package.py \ -p 1 \ -t 2 \ -d "${samplename}" \ -i "${forward}" "${reverse}" """ Please let me know if you need any help. Kind regards, Lennert

…

On Fri, Jul 22, 2022 at 3:17 PM LSTUGA ***@***.***> wrote: Hi, a Docker image would be fine. Just let me know how to access your samples. Thanks! — Reply to this email directly, view it on GitHub <#46 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHUA2TPFCKO6AEDKEIRV2ILVVKNOTANCNFSM5SSPZ6NA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

VanOverbeeke · 2022-07-29T09:24:42Z

One more question, does SeqSero2 really take absolutely raw and completely untrimmed reads as input? We have seen some improvement using completely raw input files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outcome of O2/O9 comparison can be random #46

Outcome of O2/O9 comparison can be random #46

VanOverbeeke commented Apr 5, 2022

LSTUGA commented Jul 2, 2022

VanOverbeeke commented Jul 18, 2022

LSTUGA commented Jul 22, 2022

VanOverbeeke commented Jul 28, 2022 via email

VanOverbeeke commented Jul 29, 2022

Outcome of O2/O9 comparison can be random #46

Outcome of O2/O9 comparison can be random #46

Comments

VanOverbeeke commented Apr 5, 2022

LSTUGA commented Jul 2, 2022

VanOverbeeke commented Jul 18, 2022

LSTUGA commented Jul 22, 2022

VanOverbeeke commented Jul 28, 2022 via email

VanOverbeeke commented Jul 29, 2022