In this repo, one can find all the necessary code and tools to build a continuous quality evaluation system for ML projects using the CI/CD engine of GitHub Actions.
There is a blog post that covers all the concepts and ideas behind the system. The post also contains a step-by-step tutorial and instructions on how to use this system and what the components are.
This README contains:
- Technical details to make the launch easier
- A high-level overview of the structure
The system is tested on Ubuntu 18.04 and Mac OSX 10.15.2.
All interaction with the system is done via Makefile. Thus one should have make
installed. All the code is run using Docker containers which should be installed on the host machine.
Instructions for Ubuntu and OSX are below. For Windows, the system might also work but is not tested and I can not guarantee it.
-
Ubuntu
-
Setup Docker
The most recent installation instructions for Ubuntu can be found here. For convenience purposes, I would also recommend enabling docker management for non-root users (see here). Be careful because it can lead to possible security issues.
-
Install make command-line tool
apt-get update apt-get install build-essential
-
-
Mac OS X
-
Setup Docker
The most recent installation instructions for Mac OS X can be found here.
-
Install
make
command-line toolIt is shipped in the set of command-line tools for XCode (see instructions here).
-
Install some of the missing command-line utils
brew install coreutils
-
Adapt
Makefile
Change "date" command to "gdate" command in line 13 of Makefile (here). It allows Mac users to get UNIX timestamp with the millisecond tolerance (which is not available with the default "date").
-
The system is built using the Boston House Prices regression dataset as an illustrative toy task. Few models are constructed to solve the problem and their qualities are compared. The solution is shipped as a REST API web-service inside the Docker container.
Makefile
is a file with shortcuts to control the developed system.src
contains all the necessary code to serve the model as an API endpoint.client
is for the client-side code which provides an easy way to query the server..github/workflows
contains GitHub Actions CI/CD pipeline definitions.dockers
folder stores Dockerfile for executing all the code (server, client, dashboard).notebooks
contains a notebook where data exploration and models building, training, and in-place evaluation are shown.models
stores weights and parameters for all the trained models.data
consists of data files already split into train and validation.dashboard
contains code for visualization of metrics in the form of a web dashboard.