Knowledge graph references

General overview

Seminars

CS 520 Knowledge graph seminar Course page with video links
Knowledge Graphs to Fight Covid 19 Meetup First meetup Second meetup
Graphs4Good GraphHack. Community Effort to Build a Knowledge Graph to Fight COVID-19 video

Talks

Natural Language Search with Knowledge Graphs - Trey Grainger, Lucidworks video
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs to our Scientists. Talk from AstaZeneca team (one of the BigPharma companies) on Spark+AI Summit 2019 video

Reviews

Shaoxiong Ji, Shirui Pan et al. A Survey on Knowledge Graphs: Representation, Acquisition and Applications (2020) paper
Graph Technology Landscape 2020. Great overview of raising industry of graph technologies. blog post

Reading list

A Reading List of Academic Articles using the Biological Expression Language (BEL) from Charlie Hoyt. It’s divided into the categories of software/visualization tools, algorithms/analytical frameworks, data integration, natural language processing, curation workflows, and downstream applications. bel-papers
Generation and Applications of Knowledge Graphs in Systems and Networks Biology. Doctoral thesis of Dr. Charles Tapley Hoyt that was defended on December 3rd, 2019. pdf

Knowledge graphs related to COVID-19

COVID-19 Research Knowledge Graph. Knowledge graph build from CORD-19 dataset by JPL NASA group github
Covid-19-Community. This project is a community effort to build a Neo4j Knowledge Graph (KG) that links heterogenous data about COVID-19 to help fight this outbreak! It serves as a sandbox and incubator project and the best ideas will be incorporated into the Covid-19-Net KG. github
COVID❋GRAPH. A voluntary initiative of graph enthusiasts and companies with the goal to build a knowledge graph with relevant information about the COVID-19 and the SARS-CoV-2 virus. initiative page
CoViz. A tool buld by AI2 for exploring associations between concepts appearing in the COVID-19 Open Research Dataset. Searching for a term displays a network of top related terms mined from the corpus. website
Knowledge Graph of COVID-19 Literature. Knowledge graph build by IBM as a part of its Corpus Processing Service. This knowledge graph integrates COVID-19 data from various sources. Search on graph, data and reports
BioGrakn Knowledge Graph. Collection of knowledge graphs of biomedical data. Build as demonstation by GraknLabs github blog post BioGrakn COVID github
COVID-19 Knowledge Graph: a computable, multi-modal, cause-and-effect knowledge model of COVID-19 pathophysiology. paper github
COVID-19 Disease Map. Knowledge repository of molecular mechanisms of COVID-19 as a broad community-driven effort. webpage publication fairdomhub
Knowledge Extraction to Assist Scientific Discovery from Corona Virus Literature. Knowledge graph constructed which includes 50,752 Gene nodes, 10,781 Disease nodes, 5,738 Chemical nodes, and 535 Organism nodes. These nodes are connected by 133 relation types including Gene–Chemical–Interaction Relationships, Chemical–Disease Associations, Gene–Disease Associations, Chemical–GO Enrichment Associations and Chemical–Pathway Enrichment Associations. webpage

Annotated data related to Covid-19

CORD-19. The Semantic Scholar team at the Allen Institute for AI has partnered with leading research groups to provide CORD-19, a free resource of more than 128,000 scholarly articles about the novel coronavirus for use by the global research community. official page CORD-19 explorer Kaggle discussion forum
CoronaWhy data lake. Data hub MongoDB service GoogleCloudPlatform
COVID-19 Annotated Data by SciBiteLabs. Annotated Data for the COVID-19 Open Research Dataset Challenge. github
PubTator collections on COVID-19. Pubtator provides automated annotations of biomedical entities in scientific publications. NLM/NCBI BioNLP Research Group presents recent results of applying PubTator on the literature about COVID-19 and other coronaviruses. In particular, they feature results on two specific data collections: LitCovid and CORD-19. Pubtator annotations are provided for six entity types (gene/protein, drug/chemical, disease, cell type, species and genomic variants) in two formats (BioC JSON and BioC XML). github site
CORD-19-on-FHIR. A Linked Data version of the COVID-19 Open Research Dataset (CORD-19) data. github

Ontologies and knowledge databases

Unified Medical Language System. The UMLS integrates and distributes key terminology, classification and coding standards, and associated resources. The UMLS includes 3 knowledge sources: metathesaurus (terms and codes from many vocabularie), semantic network (semantic types and their relationships), SPECIALIST Lexicon and Lexical Tools: (A large syntactic lexicon of biomedical and general English and tools for normalizing strings, generating lexical variants, and creating indexes.) website
STRING. Protein-Protein Interaction Networks. website
PharmaGKB. A pharmacogenomics knowledge resource that encompasses clinical information including clinical guidelines and drug labels, potentially clinically actionable gene-drug associations and genotype-phenotype relationships. website
The Immune Epitope Database. IEDB catalogs experimental data on antibody and T cell epitopes studied in humans, non-human primates, and other animal species in the context of infectious disease, allergy, autoimmunity and transplantation. website
Evidence and Conclusion Ontology (ECO). An ontology of evidence types for supporting conclusions in scientific research page on bioportal
Library of ontologies provided by Bioportal catalog

Building knowledge graph, information extraction

Paper search, filter and scoring

Covid-19 Semantic Browser: Browse Covid-19 & SARS-CoV-2 Scientific Papers with Transformers. an interactive experimental tool leveraging a state-of-the-art language model to search relevant content inside the COVID-19 Open Research Dataset (CORD-19) github
KDCOVID. This tool retrieves papers by measuring similarity between queries and sentences in the full text of papers in CORD19 corpus using a similarity metric derived from BioSentVec. web-tool github
SciFact. Dataset & baseline model built by AI2 for fact-checking: Given a corpus of scientific articles and a claim about a scientific finding, a fact-checking model must identify abstracts that support or refute the claim. paper github
The Semantic Scholar Search Reranker provided by AI2. github

Language models

BioBERT. BERT trained on Pubmed data by DMIS-lab team. github paper implementations list on paperwithcode
SciBERT. A BERT model for scientific text from AI2. github paper
CovidBERT. Model CovidBERT trained by DeepSet on AllenAI's CORD19 Dataset of scientific articles about coronaviruses. Impemented as a part of Transformers library github
BlueBERT. A BERT model pre-trained on PubMed abstracts and clinical notes (MIMIC-III). Provided by NLM/NCBI BioNLP Research Group github paper

Open information extraction

Open IE. System from the University of Washington (UW) and Indian Institute of Technology,Delhi (IIT Delhi). System is used by JPL NASA group github
Stanford Open IE. System from Stanford Unversity, part of Stanford CoreNLP. project page
Graphene. System outperforms state-of-the-art Open IE systems in the construction of correct n-ary predicate-argument structures. github paper
Another unsupervised approach for open relation extraction task is self-organazing maps: Elena Manishina et al. Unsupervised relation extraction from scientific texts using a self-organizing maps paper

Named Entity Recognition

BERN. BioBERT-based multi-type NER tool that also supports normalization of extracted entities. Build by DMIS-lab github paper
SciSpacy. A full pipeline and models for scientific/biomedical documents NER models. It includes biomedical NER models website github notebook with NER model
Comprehensive Named Entity Recognition (NER) on CORD-19 with Distant or Weak Supervision. blog post

Weak supervision and relation extraction

Snorkel. The system for programmatically building and managing training data. It is build by team from Stanford unversity, many companies (Google, facebook etc.) are broadly using it website.
Short review of weak supervioson approached to relation extraction task: Alisa Smirnova and Philippe Cudré-Mauroux. 2018. Relation Extraction Using Distant Supervision: A Survey. paper
A great example of using weak supervision (snorkel particularly) for biomedical information extraction (including numerical data): Kuleshov, V., Ding, J., Vo, C. et al. A machine-compiled database of genome-wide association studies, 2019 paper github
Another example of using weak supervision from BenevolentAI team, well-funded startup is building AI system for drug discovery: Julien Fauqueur et al. Constructing large scale biomedical knowledge bases from scratch with rapid annotation of interpretable patterns paper

Relation extraction

SpERT. BERT-based model with SOTA performance. paper github
GraphREL. An end-to-end relation extraction model which uses graph convolutional networks (GCNs) to jointly learn named entities and relations. paper github
SemRep. It might be a good option if you want something can work work right out of the box. It is the NLM triple extraction tool built on top of MetaMap. It comes with the usual UMLS license shenanigans and is not necessarily the latest and greatest, but works reasonably well IME. webpage
OpenNRE. An open-source and extensible toolkit that provides a unified framework to implement relation extraction models (including few-shot and document-level models). demosite github paper

Relation descriptions, schema standarts and graph processing tools

MI2CAST. Minimum Information about a Molecular Interaction CAusal STatement This checklist defines both the required core information, as well as a comprehensive set of other contextual details valuable to the end user and relevant for reusing and reproducing causal molecular interaction information. paper github
BEL. The Biological Expression Language captures causal, correlative, and associative relationships between biological entities along with the experimental/biological context in which they were observed as well as the provenance of the publication from which the relation was reported. language tutorial
PyBEL. Python software package that parses BEL documents, validates their semantics, and facilitates data interchange between common formats and database systems like JSON, CSV, Excel, SQL, CX, and Neo4J. github documentation
PyBEL-tools. library of functions for analysis of biological networks. github PyBEL-Notebooks
BEL4corona. Code, notebooks, and resources for exploring and analyzing mechanistic knowledge graphs about coronagithub

Entity linking, entity normalisation, disambiguation, grounding

PyOBO. Tools for biological identifiers, names, synonyms, xrefs, hierarchies, relations, and properties through the perspective of Open Biomedical Ontology (OBO). github blog post
Gilda grounding service. Grounding of biomedical named entities with contextual disambiguation. Developed by INDRA labs which is part of the Harvard Program in Therapeutic Science (HiTS). http://grounding.indra.bio github
Adeft. Utility for building models to disambiguate acronyms and other abbreviations of biological terms in the scientific literature. Developed by INDRA labs. github paper

Evaluation

BLUE. The Biomedical Language Understanding Evaluation benchmark consists of five different biomedicine text-mining tasks (including NER & RE) with ten corpora. Here, we rely on preexisting datasets because they have been widely used by the BioNLP community as shared tasks. paper github

Libraries

INDRA (Integrated Network and Dynamical Reasoning Assembler). An an automated model assembly system, funded by DAPRA, draws on natural language processing systems and structured databases to collect mechanistic and causal assertions, represents them in a standardized form (INDRA Statements), and assembles them into various modeling formalisms including causal graphs and dynamical models. website COVID19 model github

Graph analysis

Neo4j Graph Data Science Library. website github

Graph embeddings

Heterogeneous Graph Transformer. Graph neural network architecture from Microsoft and University of California. HGT can deal with large-scale heterogeneous and dynamic graphs paper github
OpenKE. An open toolkit for knowledge embedding (OpenKE), which provides a unified framework and various fundamental models to embed knowledge graphs into a continuous low-dimensional space. paper github
BioNEV. This work aims to systematically evaluate recent advanced graph embedding techniques on biomedical tasks. Authors compile 5 benchmark datasets for 4 biomedical prediction tasks (see paper for details) and use them to evaluate 11 representative graph embedding methods paper github
PyTorch-BigGraph. An embedding system from Facebook that incorporates several modifications to traditional multi-relation embedding systems that allow it to scale to graphs with billions of nodes and trillions of edges. paper github
BioKEEN. A package for training and evaluating biological knowledge graph embeddings built on PyKEEN. github (parent package - PyKEEN)

Graph Neural Networks

Deep Graph Library (DGL). Python package built for easy implementation of graph neural network model family, on top of existing DL frameworks (e.g. PyTorch, MXNet, Gluon etc.). website github docs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

knowledge_graph_references.md

knowledge_graph_references.md

Knowledge graph references

General overview

Seminars

Talks

Reviews

Reading list

Knowledge graphs related to COVID-19

Annotated data related to Covid-19

Ontologies and knowledge databases

Building knowledge graph, information extraction

Paper search, filter and scoring

Language models

Open information extraction

Named Entity Recognition

Weak supervision and relation extraction

Relation extraction

Relation descriptions, schema standarts and graph processing tools

Entity linking, entity normalisation, disambiguation, grounding

Other scientific document processing

Evaluation

Libraries

Graph analysis

Graph embeddings

Graph Neural Networks

Files

knowledge_graph_references.md

Latest commit

History

knowledge_graph_references.md

File metadata and controls

Knowledge graph references

General overview

Seminars

Talks

Reviews

Reading list

Knowledge graphs related to COVID-19

Annotated data related to Covid-19

Ontologies and knowledge databases

Building knowledge graph, information extraction

Paper search, filter and scoring

Language models

Open information extraction

Named Entity Recognition

Weak supervision and relation extraction

Relation extraction

Relation descriptions, schema standarts and graph processing tools

Entity linking, entity normalisation, disambiguation, grounding

Other scientific document processing

Evaluation

Libraries

Graph analysis

Graph embeddings

Graph Neural Networks