AI/ML model assessment - starter kit

The ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) was established in 2018 by ITU-T Study Group 16, and works in partnership with the World Health Organization (WHO) in order to establish a standardized framework for the assessment of AI/ML models for health. In this repository, you find a simple configuration as starter kit that you can replicate to create a benchmark (a.k.a. challenge) on our platform in order to assess different ML/AI models. The repository is based on code from (but we are not affiliated to) EvalAI. Follow the instructions given below to get started.

Configure the benchmark

Create a repository from this template as explained here, which is similar to forking this repository. You can simply click on the green "Use this template" button above (that appears when you login to GitHub).
Read the EvalAI configuration documentation to learn more about how you want to structure your benchmark a.k.a. challenge (note that our platform is not affiliated to EvalAI). Once you are ready, start making changes in the yaml file, HTML templates, and evaluation script according to your needs.
Run the command ./run.sh to generate the challenge_config.zip, when you have completed your changes.
Upload the challenge_config.zip on our platform to create a benchmark (as mentioned, our platform is not affiliated to EvalAI). The benchmark will be available publicly once our admin approves it.
Update the details of your benchmark using the UI on our platform.

Example text-based benchmark

Benchmarking is offered on our platform in two different versions of submissions:

The text-based version allows the participant to submit the predictions/output of the AI model via user interface (UI) or command-line interface (CLI) to the platform, where they are compared with the ground truth in order to assess the model performance. (An additional questionnaire can be submitted with the UI of the platform, and reviewed by an audit team.) The folder retino-public-example contains a fully configured text-based benchmark configuration that we are already hosting on our platform. You can use this example to base your own benchmark on (but for data protection, we have removed our annotation and submission files).
The docker-based version enables the participant to submit the AI/ML model via CLI in a docker image to the platform, where the model performance is evaluated in a protected environment with the test dataset. A configuration example will be available soon.

Test your evaluation script locally

In order to test the evaluation script locally before uploading it to our server, please follow the instructions below.

Copy the evaluation script, i.e, __init__.py , main.py and other relevant files from evaluation_script/ directory to challenge_data/challenge_1/ directory.
Now, edit challenge_phase name, annotation file name and submission file name in the worker/run.py file to the challenge phase codename (which you want to test for), annotation file name in the annotations/ folder (for specific phase) and corresponding submission file respectively.
Run the command python -m worker.run from the directory where annotations/ challenge_data/ and worker/ directories are present. If the command runs successfully, then the evaluation script works locally and will work on the server as well.

Directory Structure

.
├── README.md
├── annotations                               # Contains the annotations for Dataset splits
│   ├── test_annotations_devsplit.json        # Annotations of dev split
│   └── test_annotations_testsplit.json       # Annotations for test split
├── challenge_data                            # Contains scripts to test the evalautaion script locally
│   ├── challenge_1                           # Contains evaluation script for the challenge
|        ├── __init__.py                      # Imports the main.py file for evaluation
|        └── main.py                          # Challenge evaluation script
│   └── __init__.py                           # Imports the modules which involve evaluation script loading
├── challenge_config.yaml                     # Configuration file to define challenge setup
├── evaluation_script                         # Contains the evaluation script
│   ├── __init__.py                           # Imports the modules that involve annotations loading etc
│   └── main.py                               # Contains the main `evaluate()` method
├── logo.jpg                                  # Logo image of the challenge
├── submission.json                           # Sample submission file
├── run.sh                                    # Script to create the challenge configuration zip to be uploaded on EvalAI website
└── templates                                 # Contains challenge related HTML templates
    ├── challenge_phase_1_description.html    # Challenge Phase 1 description template
    ├── challenge_phase_2_description.html    # Challenge Phase 2 description template
    ├── description.html                      # Challenge description template
    ├── evaluation_details.html               # Contains description about how submissions will be evalauted for each challenge phase
    ├── submission_guidelines.html            # Contains information about how to make submissions to the challenge
    └── terms_and_conditions.html             # Contains terms and conditions related to the challenge
├── worker                                    # Contains the scripts to test evaluation script locally
│   ├── __init__.py                           # Imports the module that ionvolves loading evaluation script
│   └── run.py                                # Contains the code to run the evaluation locally

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI/ML model assessment - starter kit

Configure the benchmark

Example text-based benchmark

Test your evaluation script locally

Directory Structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github/workflows		.github/workflows
annotations		annotations
challenge_data		challenge_data
code_upload_challenge_evaluation		code_upload_challenge_evaluation
evaluation_script		evaluation_script
github		github
remote_challenge_evaluation		remote_challenge_evaluation
retino-public-example		retino-public-example
templates		templates
worker		worker
.gitignore		.gitignore
README.md		README.md
challenge_config.yaml		challenge_config.yaml
logo.jpg		logo.jpg
run.sh		run.sh
submission.json		submission.json

FG-AI4H/health-aiaudit-benchmark-starters

Folders and files

Latest commit

History

Repository files navigation

AI/ML model assessment - starter kit

Configure the benchmark

Example text-based benchmark

Test your evaluation script locally

Directory Structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages