Add Dockerfile and fix requirements #171
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi!
I recently tried to reproduce the NER results, but it was really hard to set up the system config. The requirements constraints don't work anymore, as any version of numpy > 1.20 returns this error when fine-tuning the model to NER, and the pandas version isn't compatible with the other packages.
scikit-learn
is also a better alternative thansklearn
, assklearn
is only a dummy package and installs the latest version ofscikit-learn
.Because of that, I updated the requirements and created a Dockerfile. Unfortunately, because the model weights are hosted with Google Drive, I couldn't automate the download of the model weights inside the Dockerfile.
With both of these modifications, the NER results can be easily reproduced:
docker pull tensorflow/tensorflow:1.15.5-gpu-py3-jupyter
The directory structure should look like this:
docker build -t biobert .
docker run --gpus all -it biobert /bin/bash
(remove--gpus all
if you want to use your CPU instead of GPU)In the interactive mode, you can use
run_ner.py
andbiocodes/ner_detokenize.py
without problems, and I figured that this might be useful if someone else wants to reproduce the results or develop something on top of BioBERT