Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separate the import of or semantically distinguish computed vs curated UniProt proteins mapped to GeneNodes #24

Open
mjsduncan opened this issue May 8, 2020 · 1 comment

Comments

@mjsduncan
Copy link
Contributor

as an example, entrez_to_protein_2020-04-01.scm contains

(EvaluationLink 
	(PredicateNode "expresses")
		(ListLink 
		(GeneNode "SAMD11")
		(MoleculeNode "Uniprot:A0A087WX24")
))

while codingRNA_2020-04-01.scm contains

(EvaluationLink 
	(PredicateNode "transcribed_to")
	(ListLink 
		(GeneNode "SAMD11")
		(MoleculeNode "ENST00000420190")))(EvaluationLink 
	(PredicateNode "translated_to")
	(ListLink 
		(MoleculeNode "ENST00000420190")
		(MoleculeNode "Uniprot:A6PWC8")))

if you look at this search of A6PWC8, you see that Q96NU1 and A0A087WX24 are different protein isoforms, that is they have different amino acid sequences, but Q96NUI has been verified by human curation and A0A087WX24 is an automated computational association.

depending on the analysis, only the curated version should be imported, or curated and computationally derived associations should be semantically distinguishable.

@mjsduncan
Copy link
Contributor Author

mjsduncan commented May 8, 2020

entrez_to_protein_2020-04-01.scm is derived from entrez2uniprot.csv via gene2proteinMapping.py which is in turn the output of an R script, this needs to be replaced with a pipeline directly from a current UniProt source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant