Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gene without cDNA sequence causes error when running quant.py #102

Open
adeschen opened this issue Mar 2, 2023 · 0 comments
Open

gene without cDNA sequence causes error when running quant.py #102

adeschen opened this issue Mar 2, 2023 · 0 comments

Comments

@adeschen
Copy link

adeschen commented Mar 2, 2023

Hi,

The quantification step is crashing for some of our samples. The error is always the same:

Traceback (most recent call last):
  File "/grid/tuveson/home/software_elzar/bin/miniconda/miniconda3_v4.12.0/miniconda3/envs/arcasHLA-v0.5.0_db3.47.0/share/arcas-hla-0.5.0-0/scripts/quant.py", line 250, in <module>
    allele_results[allele_id[:-1]]['allele' + allele_id[-1]] = allele

There is an easy fix to this problem. The current quant.py core (line 250):

    for allele_id, allele in genotype.items():
        allele_results[allele_id[:-1]]['allele' + allele_id[-1]] = allele

can be modified to check if the key exists and if not, create one:

   for allele_id, allele in genotype.items():
    if not allele_id[:-1] in allele_results:
        allele_results[allele_id[:-1]] = defaultdict(float)
    allele_results[allele_id[:-1]]['allele' + allele_id[-1]] = allele

That modification removes the error but don't remove the cause of the problem. The quant.py script uses the .p file to populate the allele_results dictionary. The .p contains both the genes and the genotype dictionaries. The genes dictionary is used by the quant.py script to create the keys of the allele_results dictionary. However, it appears that the genes dictionary is something missing genes that have alleles in the genotype dictionary.

It seams the cause of this problem is related to the lack of cDNA sequence for some allele as managed in the customize.py script. The script generates the .p file that is going to be used by the quant.py script. Both the genes and genotype dictionaries are saved in the .p file. However, the genes dictionary won't contains the genes that are missing cDNA sequences for both alleles (line 137 of customize.py) as the condition for the loop is not respected:

        for seq in sequences:
            hla_idx[allele_id].append(str(idx))
            genes[gene].append(allele_id)

Can you address this issue.

Best regards
Astrid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant