Flask based backend for AAPI Platform.
-
Clone this repo with submodule;
# clone the repository # use --recurse-submodules to fetch the inner module for AAPI_Code $ git clone --recurse-submodules https://github.com/Auto-annotation-of-Pathology-Images/Backend $ cd Backend
Note that since the ML-related code is saved in the manner of submodule for this repo, the version here is just a snapshot of a specific commit, if you want to follow up with the latest commit, use
git submodule update --remote
to sync with the master branch. -
Use a virtual environment to install all required packages;
-
Set environment variables
# install this package $ python -m pip install -e . # set env variables for this shell session $ export FLASK_APP=app_core $ export FLASK_ENV=production
The data preparation process can be divided into two parts,
- database setup;
- large local file setup (including slides, patches and annotations).
- Install MySQL and create a user with the following credentials.
Then run the following commands using this account in MySQL to grant permissions,
{ "user" : "AAPI", "password": "aapi2020" }
mysql> GRANT ALL PRIVILEGES ON *.* TO 'AAPI'@'localhost';
- At the root of this repo, run
flask init-db
to setup proper schemas. Note that data will only be added after the next step.
Slides, a.k.a. Whole Slide Image (WSI), are usually quite large (~300MB). Therefore, slides are saved in the local file system instead of in the database.
An initial version of annotation is also needed for every slide in order to render on frontend before human modification.
The total structure of required initial data is shown below,
/data |— slides | |— slide_001.svs | |— slide_001.xml --->[initial annotation] |— … |— roi | |— Artery_val.h5 --->[static validation data] | |— Glomerulus_val.h5 | |— Tubules_val.h5
The current online updating strategy requires preparing sevral validation datasets beforehand, in order to evaluate how the model performs given newly annotated data.
Then, run the following command to generate directories for future annotations and cached patches.
$ flask init-data --slide-dir {path_to_the_slides_folder}
The {path_to_the_slides_folder}
variable to point to the same location of the slides directory shown above.
Checkout flask init-data --help
for other options.
After this step, the whole local file system will look like below,
/data |— slides --->[user provided] | |— slide_001.svs | |— slide_001.xml | |— patches ----[code generated] | |— slide_id/ --->[md5 checksum] | |— patch001.png | |— patch002.png | |— … | |— annotations ----[code generated] | |— slide_id/ --->[md5 checksum] | |— updated_at_time001.xml | |— updated_at_time002.xml | |— … | |— initial_annotation.xml
The background updating task will also add new data to this file system, which will cache cropped ROIs
and annotations from calibrated results sent from the frontend. After periodical updates, the roi
directory will be augmented as,
|— roi | |— {CLASS_NAME}_val.h5 --->[user provided] | |— slide_id/ --->[code generated] | |— {CLASS_NAME}/ | |— {cropped_ROI || ROI_mask}.png | |— ...
The models need to be organized in a special manner for the code to load them properly. The structure would look like,
/Backend |— model | |— {CLASS_NAME} | |— initial_model.ckpt | |— {ISO_TIMESTAMP}.ckpt --->[code generated]
Pretrained models can be downloaded from here. The "model" directory should be placed at the root of this repository, i.e., the same level as the "app_core" directory.
$ flask run
If FLASK_ENV=development
, the run command should be flask run --no-reload
, otherwise the background updating will
be run twice every interval.
- Download test dataset and make it as the
../Data
directory (relative to the path of this repo). - Run
pytest
at the root of this repo to start existing tests.