Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docm field formatting #166

Open
colleenXu opened this issue Jul 21, 2023 · 0 comments
Open

docm field formatting #166

colleenXu opened this issue Jul 21, 2023 · 0 comments

Comments

@colleenXu
Copy link

colleenXu commented Jul 21, 2023

when looking at docm data https://myvariant.info/v1/query?q=_exists_:%22docm.pubmed_id%22&field=docm, the pubmed_id value is sometimes a list represented as a string. it appears to be ", "-delimited (both a comma AND a space).

It would be easier to use if it was represented as a list of strings. ex: [ "12460918", "23833300", "12068308", "12460919", "21483012", "19010912", "22649091", "19238210" ]

One example
{
  "_id": "chr7:g.140481393T>C",
  "_score": 1,
  "docm": {
    "aa_change": "p.Y472C",
    "all_domains": "pfam_Ser-Thr/Tyr_kinase_cat_dom,pfam_Prot_kinase_dom,pfam_Raf-like_ras-bd,pfam_Prot_Kinase_C-like_PE/DAG-bd,superfamily_Kinase-like_dom,smart_Raf-like_ras-bd,smart_Prot_Kinase_C-like_PE/DAG-bd,smart_Ser/Thr_dual-sp_kinase_dom,smart_Tyr_kinase_cat_dom,pfscan_Raf-like_ras-bd,pfscan_Prot_Kinase_C-like_PE/DAG-bd,pfscan_Prot_kinase_dom,prints_Ser-Thr/Tyr_kinase_cat_dom,prints_DAG/PE-bd",
    "alt": "C",
    "c_position": "c.1415",
    "chrom": 7,
    "default_gene_name": "BRAF",
    "deletion_substructures": "-",
    "disease": "LC",
    "doid": "DOID:1324",
    "domain": "pfam_Ser-Thr/Tyr_kinase_cat_dom,pfam_Prot_kinase_dom,superfamily_Kinase-like_dom,smart_Ser/Thr_dual-sp_kinase_dom,smart_Tyr_kinase_cat_dom,pfscan_Prot_kinase_dom",
    "ensembl_gene_id": "ENSG00000157764",
    "genename": "BRAF",
    "genename_source": "HGNC",
    "hg19": {
      "end": 140481393,
      "start": 140481393
    },
    "primary": 1,
    "pubmed_id": "12460918, 23833300, 12068308, 12460919, 21483012, 19010912, 22649091, 19238210",
    "ref": "T",
    "source": "MyCancerGenome",
    "strand": -1,
    "transcript_error": "no_errors",
    "transcript_name": "ENST00000288602",
    "transcript_source": "ensembl",
    "transcript_species": "human",
    "transcript_status": "known",
    "transcript_version": "74_37",
    "trv_type": "missense",
    "type": "SNP",
    "ucsc_cons": 1,
    "url": "http://www.mycancergenome.org/content/disease/lung-cancer/braf/209"
  }
}

EDIT: there are other fields that are also a little tricky to parse:

  • docm.source: sometimes it's "-" and it's unclear what this means. other times it's null
  • docm.url: sometimes this field's value is null (rather than not having the field when there's no value)
@colleenXu colleenXu changed the title docm pubmed field formatting docm field formatting Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant