Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not use the variant HGVS as the name when storing long variants #26

Closed
pnrobinson opened this issue Jul 20, 2023 · 1 comment
Closed
Assignees
Labels
bug Something isn't working

Comments

@pnrobinson
Copy link
Member

pnrobinson commented Jul 20, 2023

When creating a cohort

cc = PhenopacketCohortCreator(pc)
patientCohort = cc.create_cohort(fpath_phenopackets)

I get this error

File ~/GIT/genophenocorr/src/genophenocorr/variant/_annotators.py:180, in VariantAnnotationCache.store_annotations(self, variant_coordinates, annotation)
    178 def store_annotations(self, variant_coordinates: VariantCoordinates, annotation: Variant):
    179     fpath = self._create_file_name(variant_coordinates)
--> 180     with open(fpath, 'wb') as f:
    181         pickle.dump(annotation, f)

because of this variant name

'CACHE/17_42709934_42712286_GACCTGGAAGAGAAATCCAACGGGCCTGTCACTCCTCGAGCAAGGGGGTCAGGTAAGTGGCCCAGCTGGGTGCTGGCCTTGGGAGGGTTCTGAGAAACTCAGGCAGCTGACCAAGCCTCTCATCAGTCAGGGAGAGACAGAGTGCCACTGGAACATTGGGTTACTGGCTCTGAAGTTCATTCCTAATTATTTATCCTGACTCAGGAAAGGAGAAATACTGAGCACAGTAATACCGCCCCTGGTCAGAAGCTGTCACCTACTACTCTTTCTACCAAGCCACGGGTAGAAGAGTGGGCTGACTGTGACCAACAGTATCTTCTTCTTTTTAGGAAGGGCAACGCTGTGCCTTGTGTAACTGAGTGTAAGGCAGGACAGGACAGGACAGGAATGGTTTCAGTGGGCTAAATATTAGCTCCCTCTGTCAGTATAAAGATACCGGAGCCTCAGCCATTTCAATAGGATGTGTTTTTTCTCTTAAAGCACTGGTTTTTAGTTTTTCCTTTTCTTTGTTGGGGCTATTGGCCCTTTGTGGGGGATCTTTGAAAACTGTAACTATTCTCAGGAAAATACAGACAAGAACATTCTTGCATACAAATCCATAGATGGTTACGTTGAGAACCTGTGATCAGGGAAATAGGTATGAGCTCCAAAATGAAAGCAAAGGGCACTTCAGCTCATGGTTCTGTTTTTGTTTGTTTTTTTTTTTTTTTTTTAAGAGAGAGGGTCTCATACTCTTGGCCAGGCTGGAGTGCAGTGGTGCCATCATAGCTCAATGTAGTATAGAACTCCTGGGCTCAAGCCATCTTCCCACCTCAGCCTCCTGAGTACTAGGACTACAGGTACGTGGCTTTTTTTTTTTTTTTTTTTTTGTAGAAATGGGGTCTCACTTTGTTGCCCACACTGGTCCTGAAATCCTGGCTTCAAGCGATCCTCCCACCATGGCTTCCCAAAGCACTGGAATTCTAGGTGTGAGCCACCTTGCCCAGTCCATGGTTCTATTAATTGTTCTCAGTACAGGAAGCATGAAGAAGAGGCCACAGAGTCTCCTCCAGAAGGTAGGAAGCCAAAGCATTGGGGTTCCTTTCCTGTTGGACATGCTGGCCCTGACAGCTGCCTCCTTGTCCCTGTTCTTCAGTCTGTCTTCTCACTGTGGTCTTTTCCTGTCTTTTCCTGGGCCAATCACTTGAGGTCAGGAGTTTGAGACCAGCCTGGCCAACATGGTGAAACCCCATCTCTACTAAAAAAAAATACAAAAATGGCCAGGCACATTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGTGGATCACCTGAGGTCAGGAGTTTGAGACCAGCCTGGCCAACATGGTGAAACCCCGTCTCTATTACAAATACAAAATTAGCCGGGCGTGGTGGTGCACACCTGTAATCCCAGCTACTTGGGAGGCTGTGGCAGAAGAATCACTTGAACCTGGGAGGCGGAGGCTGCAGTGAGCTGAGATCATGCCACTGCACTCCAGCCTGGGCAACAGACCGAGACTCCATCTCAAAAAAACAAAACAAAAAAAATTAGCTGGGTGTGGTGGTGGGCACCTGTAATCCCAGTTGCTTGGGAGGATGAGGCAGAAGAATCACTTGAACTTGGGAGGCGGAGGTTGCAGTGAACCAAGATTATGCCACTGCACCACTCCAGCCTGGGCAACAGAGCGAGATTCTGTCTCAAAAAAAAAAAAAAATTAGCTGGGCATACTGGCCTGCACCTGTAGTCCCTTGCTACTTGCTTGGCTGAGGGGAGAGGACTGCTTGAGCCCAGGAGGCGGAGGTTGCAGTGAGCTATGATCATGCCACTGCACTCCAGCCTGGGCGACACAGTGAAACCCTGTCTCAAAGACAAAATAAAGATAATCTAGTGATAGAAAATGTGGAGAATAAAATGACTGAAGAGGCTGGCGGAGTGGTGGAGGGAGCAGCAGCTGCAGCAGCTGCAGCAGCAGCAGCAGTGTGCTCATTAACAAGAGCCACAGAAAGACCTGGGAGTCCCTTCTGGGAAAGGGGTACACATTTAGAAAGGAGGCCAGAGCCAAAAAAAAGAAGCGAAAGAGTGTAGGACCCAGAAGCATTAAATAGAGTCCAGACAGAAATGAGCATTCAGCAAGGAGGAGGCGGGTCCCCAAACATCATTAGGCCTGGCACTTGCAGAAGGGCCATGTTTGGGAAACTCACAGAAGCACAGGCTCATCAGGGACTGAACTTAAGACAACTTCTCTCCAGACCCAGACACACAGCCTGGTAAGATGGCAAAGGGCTGGACAGAGCAATGCGTGAAAGGAGGGGCCCATTTGTTCTGCTGCTTCCAGATGGT_G_heterozygous.pickle'

I would suggest something like this

import tempfile
if len(variant_name) > 50:
     variant_name = tempfile.TemporaryFile()

of just storing something like variant1, variant2 etc -- I do not think these names matter at all for the rest of the code in a notebook.

@pnrobinson pnrobinson added the bug Something isn't working label Jul 20, 2023
@ielis
Copy link
Member

ielis commented Oct 4, 2023

I think this has been addressed in #70 . The file name is now generated using variant_key for shorter sequence + symbolic variants, or using variant_class for longer sequence INDELs, as shown here.

@ielis ielis closed this as completed Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants