This Metadata Annotation Workbench supports standardised metadata annotation of variable catalogues in the Excel Spreadsheet and CSV formats with standardised terminologies from the Semantic Lookup Platform SemLookP.
- Docker
- Docker Compose
- Python 3.8
- NPM
- Node >= 14
This web application allows users to upload data collection instruments in the Microsoft Excel-Spreadsheet format. The metadata annotation is performed in the web browser. The user is provided with search results of the instrument's data items from the terminology service and can perform a manual text search. The initial search suggestion for each data item is generated by string matching and simple natural language processing like tokenization and stemming. The user can select an ontology for annotation from all ontologies included in the terminology service. Detailed information about a concept is displayed in the semantic information widget that is provided by the terminology service. The annotated instrument can then be downloaded comprising the data items and corresponding annotations as international resource identifier (IRI), a unique and machine-readable identifier that leads to the metadata of the data item.
For installing the package @nfdi4health/semlookp-widgets you need an access token. You can include your personal access token by logging in to npm on the command line or add the auth token to the .npmrc file. For more information see: https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-npm-registry#authenticating-to-github-packages
docker-compose --env dev.env up db
docker-compose --env dev.env up prediction
Hint: It needs some time to download the data model after starting the Docker container.
cd backend
python3 -m venv venv
# activate environment and install python3-dev
source ./venv/bin/activate
# install requirements
pip3 install -r ./requirements.txt
# download the spacy model for text processing
python -m spacy download en_core_web_sm
export FLASK_APP=restapi
export FLASK_DEBUG=true
export DB_USERNAME=postgres
export DB_PASSWORD=postgres
export DB_HOST=localhost
export API_SEMLOOKP=https://semanticlookup.zbmed.de/ols/api/
export INSTRUMENTS=instruments
export API_PREDICT=http://172.29.0.5:5000
flask run
Eventually you have to adapt the API_PREDICT variable. Find the IP address by running docker inspect annobench_prediction_1
.
Start the frontend:
cd frontend
npm install
npm start
Create production build for all services:
Uncomment build: ./frontend
and build: ./backend
in docker-compose.yaml
and run:
cd frontend
npm run build
cd ..
docker-compose --env-file dev.env build
docker-compose --env-file dev.env up
The frontend service is available at http://localhost:8090/.
To stop a production build of all services:
docker-compose --env-file dev.env down
Check out https://github.com/nfdi4health/workbench-AI-model
The project is MIT licensed.
This work was done as part of the NFDI4Health Consortium and is published on behalf of this Consortium (www.nfdi4health.de). It is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number 442326535.