Skip to content

A metadata annotation service to support standardised metadata annotation developed by T3.2.

License

Notifications You must be signed in to change notification settings

nfdi4health/metadata-annotation-workbench

Repository files navigation

Metadata Annotation Workbench

This Metadata Annotation Workbench supports standardised metadata annotation of variable catalogues in the Excel Spreadsheet and CSV formats with standardised terminologies from the Semantic Lookup Platform SemLookP.

Table of contents

Requirements

Production

  • Docker
  • Docker Compose
  • Development

  • Python 3.8
  • NPM
  • Node >= 14

General info

This web application allows users to upload data collection instruments in the Microsoft Excel-Spreadsheet format. The metadata annotation is performed in the web browser. The user is provided with search results of the instrument's data items from the terminology service and can perform a manual text search. The initial search suggestion for each data item is generated by string matching and simple natural language processing like tokenization and stemming. The user can select an ontology for annotation from all ontologies included in the terminology service. Detailed information about a concept is displayed in the semantic information widget that is provided by the terminology service. The annotated instrument can then be downloaded comprising the data items and corresponding annotations as international resource identifier (IRI), a unique and machine-readable identifier that leads to the metadata of the data item.

Setup

Development

For installing the package @nfdi4health/semlookp-widgets you need an access token. You can include your personal access token by logging in to npm on the command line or add the auth token to the .npmrc file. For more information see: https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-npm-registry#authenticating-to-github-packages

Run the Docker container with the Postgres database

docker-compose --env dev.env up db

Run the Docker container with the prediction

docker-compose --env dev.env up prediction

Hint: It needs some time to download the data model after starting the Docker container.

Install the virtual environment and requirements

cd backend
python3 -m venv venv
# activate environment and install python3-dev
source ./venv/bin/activate
# install requirements
pip3 install -r ./requirements.txt
# download the spacy model for text processing
python -m spacy download en_core_web_sm

Define environment variables and run the application

export FLASK_APP=restapi
export FLASK_DEBUG=true
export DB_USERNAME=postgres
export DB_PASSWORD=postgres
export DB_HOST=localhost
export API_SEMLOOKP=https://semanticlookup.zbmed.de/ols/api/
export INSTRUMENTS=instruments
export API_PREDICT=http://172.29.0.5:5000
flask run

Eventually you have to adapt the API_PREDICT variable. Find the IP address by running docker inspect annobench_prediction_1.

Start the frontend:

cd frontend
npm install
npm start

Production

Create production build for all services:
Uncomment build: ./frontend and build: ./backend in docker-compose.yaml and run:

cd frontend
npm run build
cd ..
docker-compose --env-file dev.env build
docker-compose --env-file dev.env up

The frontend service is available at http://localhost:8090/.

To stop a production build of all services: docker-compose --env-file dev.env down

AI model

Check out https://github.com/nfdi4health/workbench-AI-model

License

The project is MIT licensed.

Funding

This work was done as part of the NFDI4Health Consortium and is published on behalf of this Consortium (www.nfdi4health.de). It is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number 442326535.