BoneBert is our proposed information extraction system of Bone X-ray radiology reports to retrieve the details of bone fractrue detection and diagnosis based on BERT by Google. The "semi-supervised" model was first trained on annotations generated by a handcrafted rule-based labelling system (BonePert) and later fine-tuned on a small set of expert annotations.
Please refer to our paper for more details.
data/
|-- train.csv / training set
|-- val.csv / validation set
|-- test.csv / test set
Here, sample.csv
gives a few samples from the training set.
Install all the python packages required.
$ pip -r requirements.txt
$ python run_bonepert.py
By default, the code uses the manually expanded rule base in BonePert+. Alternatively, you can use the rule base in BonePert.
There are two ways of setting up the environment.
Build a Docker image from Dockerfile
.
$ nvidia-docker build -t bonebert .
Start a container using the image just built.
$ nvidia-docker run -t -d \
--env PYTHONPATH=. \
--env NVIDIA_VISIBLE_DEVICES=all \
--env MODEL_DIR=/model \
--env DATA_DIR=/data \
--env TRAIN_OUTPUT_DIR=/output_train \
--env FINETUNE_OUTPUT_DIR=/output_finetune \
--mount type=bind,source=/$(pwd)/bert/run_bluebert_ner.py,target=/bonebert/bluebert/run_bluebert_ner.py \
-v /$(pwd)/data:/data \
-v /$(pwd)/output_train:/output_train \
-v /$(pwd)/output_finetune:/output_finetune \
bonebert
Clone ncbi-nlp/bluebert and install all the required packages using BlueBert's requirements.txt
.
$ pip -r requirements.txt
Replace the bluebert/run_bluebert_ner.py
file with run_bluebert_ner.py
.
To train a BoneBert model with GPU, please ensure that you have at least 8GB of GPU memory.
$ python run_convert_to_bert.py
$ nvidia-docker exec -it [container-id] \
python bluebert/run_bluebert_ner.py \
--do_prepare=true \
--do_train=true \
--do_predict=true \
--task_name=extra \
--vocab_file=$MODEL_DIR/vocab.txt \
--bert_config_file=$MODEL_DIR/bert_config.json \
--init_checkpoint=$MODEL_DIR/bert_model.ckpt \
--num_train_cpochs=30.0 \
--do_lower_case=true \
--data_dir=$DATA_DIR \
--output_dir=$TRAIN_OUTPUT_DIR
$ nvidia-docker exec -it [container-id] \
python bluebert/run_bluebert_ner.py \
--do_prepare=true \
--do_train=true \
--do_predict=true \
--task_name=fracture \
--vocab_file=$MODEL_DIR/vocab.txt \
--bert_config_file=$MODEL_DIR/bert_config.json \
--init_checkpoint=$TRAIN_OUTPUT_DIR/bert_model.ckpt-9645 \
--num_train_cpochs=30.0 \
--do_lower_case=true \
--data_dir=$DATA_DIR \
--output_dir=$FINETUNE_OUTPUT_DIR
$ python run_analyse_bert.py
- Dai Z., Li Z., Han L. (2021) BoneBert: A BERT-based Automated Information Extraction System of Radiology Reports for Bone Fracture Detection and Diagnosis. In: Abreu P.H., Rodrigues P.P., Fernández A., Gama J. (eds) Advances in Intelligent Data Analysis XIX. IDA 2021. Lecture Notes in Computer Science, vol 12695. Springer, Cham. https://doi.org/10.1007/978-3-030-74251-5_21
@InProceedings{10.1007/978-3-030-74251-5_21,
author="Dai, Zhihao and Li, Zhong and Han, Lianghao",
title="BoneBert: A BERT-based Automated Information Extraction System of Radiology Reports for Bone Fracture Detection and Diagnosis",
booktitle="Advances in Intelligent Data Analysis XIX",
year="2021",
publisher="Springer International Publishing",
address="Cham",
pages="263--274",
isbn="978-3-030-74251-5"
}
The code is adapted from ncbi-nlp/NegBio and ncbi-nlp/BlueBERT.
We are grateful for the authros of NegBio, CheXpert-labeller, and BlueBERT.