Add Dockerfile and fix requirements #171

leoramme · 2021-11-22T19:15:58Z

Hi!

I recently tried to reproduce the NER results, but it was really hard to set up the system config. The requirements constraints don't work anymore, as any version of numpy > 1.20 returns this error when fine-tuning the model to NER, and the pandas version isn't compatible with the other packages. scikit-learn is also a better alternative than sklearn, as sklearn is only a dummy package and installs the latest version of scikit-learn.

Because of that, I updated the requirements and created a Dockerfile. Unfortunately, because the model weights are hosted with Google Drive, I couldn't automate the download of the model weights inside the Dockerfile.

With both of these modifications, the NER results can be easily reproduced:

Pull the official tensorflow-gpu image with docker pull tensorflow/tensorflow:1.15.5-gpu-py3-jupyter
Download and extract BioBERT-Base v1.1 (+ PubMed 1M) inside the biobert repo:

The directory structure should look like this:

biobert/
├── biobert_v1.1_pubmed
│   ├── bert_config.json
│   ├── model.ckpt-1000000.data-00000-of-00001
│   ├── model.ckpt-1000000.index
│   ├── model.ckpt-1000000.meta
│   └── vocab.txt
├── biocodes
│   ├── [...]
├── create_pretraining_data.py
├── Dockerfile
├── download.sh
├── extract_features.py
├── figs
│   └── biobert_overview.png
├── __init__.py
├── LICENSE
├── modeling.py
├── modeling_test.py
├── optimization.py
├── optimization_test.py
├── README.md
├── requirements.txt
├── run_classifier.py
├── run_ner.py
├── run_pretraining.py
├── run_qa.py
├── run_re.py
├── sample_text.txt
├── tf_metrics.py
├── tokenization.py
└── tokenization_test.py

To build the image, run docker build -t biobert .
To start the image in interactive mode, run docker run --gpus all -it biobert /bin/bash (remove --gpus all if you want to use your CPU instead of GPU)

In the interactive mode, you can use run_ner.py and biocodes/ner_detokenize.py without problems, and I figured that this might be useful if someone else wants to reproduce the results or develop something on top of BioBERT

leoramme added 2 commits November 22, 2021 15:34

Fix broken requirements

a657c4d

Add Dockerfile

747b03e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Dockerfile and fix requirements #171

Add Dockerfile and fix requirements #171

leoramme commented Nov 22, 2021

Add Dockerfile and fix requirements #171

Are you sure you want to change the base?

Add Dockerfile and fix requirements #171

Conversation

leoramme commented Nov 22, 2021