This repository contains a machine learning pipeline that reads a clinical note in Dutch and assigns the functioning level of the patient based on the textual description.
We focus on 9 WHO-ICF domains, which were chosen due to their relevance to recovery from COVID-19:
ICF code | Domain | name in repo |
---|---|---|
b1300 | Energy level | ENR |
b140 | Attention functions | ATT |
b152 | Emotional functions | STM |
b440 | Respiration functions | ADM |
b455 | Exercise tolerance functions | INS |
b530 | Weight maintenance functions | MBW |
d450 | Walking | FAC |
d550 | Eating | ETN |
d840-d859 | Work and employment | BER |
- FAC and INS have a scale of 0-5, where 5 means there is no functioning problem.
- The rest of the domains have a scale of 0-4, where 4 means there is no functioning problem.
- For more information about the levels, refer to the annotation guidelines.
- NOTE: the values generated by the machine learning pipeline might sometimes be outside of the scale (e.g. 4.2 for ENR); this is normal in a regression model.
The input is a csv file with at least one column containing the text (one clinical note per row).
The csv must follow the following specifications:
- sep = ;
- quotechar = "
- encoding = utf-8
- the first row is the header (column names)
See example in example/input.csv.
The output file is saved in the same location as the input; it has 'output' added to the original file name.
The output file contains the same columns as the input + 9 new columns with the functioning levels per domain.
The functioning levels are generated per row. If a cell is empty, it means that this domain is not discussed in this note (according to the algorithm).
See example in example/input_output.csv.
The pipeline includes a multi-label classification model that detects the domains mentioned in a sentence, and 9 regression models that assign a level to sentences in which a specific domain was detected. All models were created by fine-tuning a pre-trained Dutch medical language model.
The pipeline includes the following steps:
- Install Docker: see here for Windows and here for macOS.
- Pull the docker image from DockerHub by typing in your command line:
$ docker pull piekvossen/a-proof-icf-classifier
- Run the pipeline with the
docker run
command. You need to pass the following arguments:
--in_csv
: path to the input csv file--text_col
: name of the text column in the csv
For example -
$ docker run piekvossen/a-proof-icf-classifier --in_csv .example/input.csv --text_col text
Running the docker for the first time, will download the models from huggingface:
In total, 10 tranformer models will be downloaded, each between 500MB and 1GB. This will take a while. After downloading, the cached models will be used.