bionlp_19_bb_pipeline

WIP pipeline for completing the BioNLP 2019 bacterial biotope NER/norm task

The BioNLP bacterial biotope shared task is to extract bacterial organisms, habitats, and medical phenotypes from Pubmed articles and link them by association.

The Shared Task description is here: https://drive.google.com/file/d/1G0po_xlRjQCZ-qxuA_4PLdipXU6rtYTp/view

NER/Norm

The first part of the shared task is Named Entity Recognition (NER) and Normalization.

The goal of this step is to correctly identify and classify words or multi-word phrases in biomedical texts that correspond to the entities of interest in this task: bacterial species, bacterial habitats, medical phenotypes, and geographical locations.

The rough steps to accomplish this are:

Generate lists of the entities of interest from NCBI (bacteria) and the provided .obo resource (habitats, phenotypes).
Annotate these entities in biomedical texts.
Train a model on the annotated texts to recognize new entities used in similar contexts to those in the original lists (NER).
Create general rules to flexibly link clusters of synonymous entities to a single identity (normalization).

Scripts to generate entity lists:

-generate_bacteria_taxid_dict.py (bacteria)
-extract_obo_category_nodes.py (habitat, phenotype) \

Scripts to obtain and annotate biomedical texts:

-easy_pubmed_batch_downloads.R

BERT

A separate effort to fine-tune and test domain-specific BERT models (Biobert, NCBI_Bluebert) on the training data provided by BioNLP. These are Colab notebooks to make use of the free GPUs.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.idea		.idea
colab_notebooks		colab_notebooks
named_entity_jsons		named_entity_jsons
pubmed_batch_download		pubmed_batch_download
resources		resources
README.md		README.md
convert_bionlp_ner_train_to_bert_ner_train.py		convert_bionlp_ner_train_to_bert_ner_train.py
extract_obo_category_nodes.py		extract_obo_category_nodes.py
generate_bacteria_taxid_dict.py		generate_bacteria_taxid_dict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bionlp_19_bb_pipeline

NER/Norm

Scripts to generate entity lists:

Scripts to obtain and annotate biomedical texts:

About

Releases

Packages

Languages

MAyars7/bionlp_19_bb_pipeline

Folders and files

Latest commit

History

Repository files navigation

bionlp_19_bb_pipeline

NER/Norm

Scripts to generate entity lists:

Scripts to obtain and annotate biomedical texts:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages