-
Notifications
You must be signed in to change notification settings - Fork 91
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[biomed nl] update embedding index with linked props examples (#4052)
- Loading branch information
1 parent
ec3160c
commit 11178ed
Showing
6 changed files
with
106 additions
and
47 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
medium_ft: embeddings_medium_2024_03_14_16_38_53.ft_final_v20230717230459.all-MiniLM-L6-v2.csv | ||
sdg_ft: embeddings_sdg_2023_12_26_10_03_03.ft_final_v20230717230459.all-MiniLM-L6-v2.csv | ||
undata_ft: embeddings_undata_2024_03_20_11_01_12.ft_final_v20230717230459.all-MiniLM-L6-v2.csv | ||
bio_ft: embeddings_bio_2024_03_04_10_28_51.ft_final_v20230717230459.all-MiniLM-L6-v2.csv | ||
bio_ft: embeddings_bio_2024_03_19_16_39_03.ft_final_v20230717230459.all-MiniLM-L6-v2.csv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Curated Input for Bio index | ||
|
||
This index has properties used by biomedical entities and follows the format of [relation expressions](https://docs.datacommons.org/api/rest/v2#relation-expressions). Properties can be structured like: | ||
|
||
- `prop`: this can match to either in or out properties | ||
- e.g., `virusHost` which will match both the 'in' and 'out' values for the property virusHost | ||
- `->prop`: this matches to an 'out' property | ||
- e.g., `->phylum` which will match the 'out' values for the property phylum | ||
- `<-prop`: this matches to an 'in' property | ||
- e.g., `<-virusGenus` which will match the 'in' values for the property virusGenus | ||
- `<-prop1{typeOf:X}->prop2`: in this case, we will get all the 'in' values for prop1 that are of type X & then from those values, get all the 'out' values for prop2 | ||
- e.g., `<-geneID{typeOf:DiseaseGeneAssociation}->diseaseOntologyID` which will first get all the DiseaseGeneAssociations that are 'in' values for the property geneID and then get all the 'out' values for the property diseaseOntologyID for those DiseaseGeneAssociations. |
83 changes: 43 additions & 40 deletions
83
tools/nl/embeddings/data/curated_input/bio/sheets_svs.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,45 +1,48 @@ | ||
dcid,Name,Description,Override_Alternatives,Curated_Alternatives | ||
ofVirusSpecies,ofVirusSpecies,"The species of a virus isolate",, | ||
virusHost,virusHost,"A specific organism or taxonomic group of organisms that are susceptible to be infected by a virus",,"host of a virus" | ||
ncbiTaxonID,ncbiTaxonID,"NCBI Taxonomy database identifier",, | ||
diseaseName,diseaseName,"preferred disease name for the concept specified by disease identifiers",,"The name of the disease" | ||
observedAllele,observedAllele,"The sequences of the observed alleles from rs-fasta files.",, | ||
referenceAlleleNCBI,referenceAlleleNCBI,"Reference genomic sequence from dbSNP",,"reference allele" | ||
ofVirusSpecies,ofVirusSpecies,The species of a virus isolate,, | ||
virusHost,virusHost,A specific organism or taxonomic group of organisms that are susceptible to be infected by a virus,,host of a virus | ||
ncbiTaxonID,ncbiTaxonID,NCBI Taxonomy database identifier,, | ||
diseaseName,diseaseName,preferred disease name for the concept specified by disease identifiers,,The name of the disease | ||
observedAllele,observedAllele,The sequences of the observed alleles from rs-fasta files.,, | ||
referenceAlleleNCBI,referenceAlleleNCBI,Reference genomic sequence from dbSNP,,reference allele | ||
class,class,,, | ||
phylum,phylum,,, | ||
geneticVariantFunctionalCategory,geneticVariantFunctionalCategory,"Functional category of the genetic variant",, | ||
hg19GenomicPosition,hg19GenomicPosition,"The genomic position of a genetic variant using the hg19 assembly",, | ||
hg19GenomicLocation,hg19GenomicLocation,"The genomic location of a genetic variant using the hg19 assembly",, | ||
hg38GenomicPosition,hg38GenomicPosition,"The genomic position of a genetic variant using the hg38 assembly",, | ||
hg38GenomicLocation,hg38GenomicLocation,"The genomic location of a genetic variant using the hg38 assembly",, | ||
hasRNATranscript,hasRNATranscript,"Recorded transcript",,"RNA transcript that a gene has" | ||
strandOrientation,strandOrientation,"The strand on which a given annotation is located",,"The orientation of the strand on which an annotation is located" | ||
typeOfGene,typeOfGene,"The type of gene",, | ||
omimID,omimID,"OMIM database identifier",, | ||
geneticVariantFunctionalCategory,geneticVariantFunctionalCategory,Functional category of the genetic variant,, | ||
hg19GenomicPosition,hg19GenomicPosition,The genomic position of a genetic variant using the hg19 assembly,, | ||
hg19GenomicLocation,hg19GenomicLocation,The genomic location of a genetic variant using the hg19 assembly,, | ||
hg38GenomicPosition,hg38GenomicPosition,The genomic position of a genetic variant using the hg38 assembly,, | ||
hg38GenomicLocation,hg38GenomicLocation,The genomic location of a genetic variant using the hg38 assembly,, | ||
hasRNATranscript,hasRNATranscript,Recorded transcript,,RNA transcript that a gene has | ||
strandOrientation,strandOrientation,The strand on which a given annotation is located,,The orientation of the strand on which an annotation is located | ||
typeOfGene,typeOfGene,The type of gene,, | ||
omimID,omimID,OMIM database identifier,, | ||
icd10CMCode,icd10CMCode,"The disease diagnosis code for version 10 of the International Classification of Diseases (ICD), Clinical Modification",, | ||
subClassificationOf,subClassificationOf,"subclassification of",, | ||
snomedCT,snomedCT,"Systematiized Nomenclature of Medicine (SNOMED) clinical terms (CT) code",, | ||
unifiedMedicalLanguageSystemConceptUniqueIdentifier,unifiedMedicalLanguageSystemConceptUniqueIdentifier,"Unified Medical Language System (UMLS) Concept Unique Identifier (CUI)",, "UMLS CUI" | ||
specializationOf,specializationOf,"specialization of",, | ||
chemblID,chemblID,"ChEMBL identifier",, | ||
simplifiedMolecularInputLineEntrySystem,simplifiedMolecularInputLineEntrySystem,"Simplified Molecular Input Line Entry System (SMILE)",, | ||
medicalSubjectHeadingSupplementaryRecordID,medicalSubjectHeadingSupplementaryRecordID,"A unique ID for a Medical Subject Heading supplementary record",,"An ID for a Medical Subject Heading supplementary record;MeSH supplementary record ID" | ||
medicalSubjectHeadingDescriptorID,medicalSubjectHeadingDescriptorID,"A unique ID for a Medical Subject Heading Descriptor record",,"An ID for a Medical Subject Heading descriptor record;MeSH descriptor record ID" | ||
subClassificationOf,subClassificationOf,subclassification of,, | ||
snomedCT,snomedCT,Systematiized Nomenclature of Medicine (SNOMED) clinical terms (CT) code,, | ||
unifiedMedicalLanguageSystemConceptUniqueIdentifier,unifiedMedicalLanguageSystemConceptUniqueIdentifier,Unified Medical Language System (UMLS) Concept Unique Identifier (CUI),," ""UMLS CUI""" | ||
specializationOf,specializationOf,specialization of,, | ||
chemblID,chemblID,ChEMBL identifier,, | ||
simplifiedMolecularInputLineEntrySystem,simplifiedMolecularInputLineEntrySystem,Simplified Molecular Input Line Entry System (SMILE),, | ||
medicalSubjectHeadingSupplementaryRecordID,medicalSubjectHeadingSupplementaryRecordID,A unique ID for a Medical Subject Heading supplementary record,,An ID for a Medical Subject Heading supplementary record;MeSH supplementary record ID | ||
medicalSubjectHeadingDescriptorID,medicalSubjectHeadingDescriptorID,A unique ID for a Medical Subject Heading Descriptor record,,An ID for a Medical Subject Heading descriptor record;MeSH descriptor record ID | ||
activeIngredient,activeIngredient,"component that provides pharmacological activity or other direct effect in the diagnosis, cure, mitigation, treatment, or prevention of disease, or to affect the structure or any function of the body of man or animals",, | ||
administrationRoute,administrationRoute,"The method by which a drug is administered",, | ||
dosageForm,dosageForm,"physical form in which a drug is produced and dispensed",, | ||
antibodyType,antibodyType,"type of antibody",, | ||
antigenType,antigenType,"type of antigen",, | ||
chromosomeSize,chromosomeSize,"number of nucleotides in a chromosome",,"Size of chromosome" | ||
ensemblID,ensemblID,"Ensembl ID",, | ||
fullName,fullName,"full name of the gene",, | ||
geneID,geneID,"gene id",, | ||
ncbiProteinAccessionNumber,ncbiProteinAccessionNumber,"NCBI protein accession number",, | ||
alleleOrigin,alleleOrigin,"Variant allele origin",,"Origin of variant allele" | ||
alleleType,alleleType,"The allele of a genetic variant observed within a population",,"Type of allele" | ||
ncbiDNASequenceName,ncbiDNASequenceName,"NCBI defined segment of DNA sequence name",,"Name used by NIH NCBI to refer to a segment of DNA sequence" | ||
imageUrl,imageUrl,"url to an image of what the biological specimen looks like",,"what the entity looks like" | ||
genomicCoordinates,genomicCoordinates,"genomic coordinates",, | ||
availableStrength,availableStrength,"dose approved for a drug",, | ||
referenceSNPClusterID<-GeneticVariantGeneAssociation->geneSymbol,GeneticVariantGeneAssociation,"Association between a genetic variant and a gene",,"Gene associated with a genetic variant;genetic variant associated with a gene" | ||
diseaseOntologyID<-DiseaseGeneAssociation->geneID,DiseaseGeneAssociation,"Association of a disease and a gene",, | ||
administrationRoute,administrationRoute,The method by which a drug is administered,, | ||
dosageForm,dosageForm,physical form in which a drug is produced and dispensed,, | ||
antibodyType,antibodyType,type of antibody,, | ||
antigenType,antigenType,type of antigen,, | ||
chromosomeSize,chromosomeSize,number of nucleotides in a chromosome,,Size of chromosome | ||
ensemblID,ensemblID,Ensembl ID,, | ||
fullName,fullName,full name of the gene,, | ||
geneID,geneID,gene id,, | ||
ncbiProteinAccessionNumber,ncbiProteinAccessionNumber,NCBI protein accession number,, | ||
alleleOrigin,alleleOrigin,Variant allele origin,,Origin of variant allele | ||
alleleType,alleleType,The allele of a genetic variant observed within a population,,Type of allele | ||
ncbiDNASequenceName,ncbiDNASequenceName,NCBI defined segment of DNA sequence name,,Name used by NIH NCBI to refer to a segment of DNA sequence | ||
imageUrl,imageUrl,url to an image of what the biological specimen looks like,,what the entity looks like | ||
genomicCoordinates,genomicCoordinates,genomic coordinates,, | ||
availableStrength,availableStrength,dose approved for a drug,, | ||
<-referenceSNPClusterID{typeOf:GeneticVariantGeneAssociation}->geneSymbol,GeneticVariantGeneAssociation,Gene associated with a genetic variant,, | ||
<-geneSymbol{typeOf:GeneticVariantGeneAssociation}->referenceSNPClusterID,GeneticVariantGeneAssociation,genetic variant associated with a gene,, | ||
<-diseaseOntologyID{typeOf:DiseaseGeneAssociation}->geneID,DiseaseGeneAssociation,Gene associated with a disease,, | ||
<-geneID{typeOf:DiseaseGeneAssociation}->diseaseOntologyID,DiseaseGeneAssociation,Disease associated with a gene,, | ||
virusGenus,virusGenus,genus of a virus species,, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.