InfantinO is an online learning framework targeting infants' facial expression classification which provides management about unconventional machine learning lifecycle including annotation & model update. Label ambiguity, catastrophic forgetting under frequent domain shift, insufficient dataset is a main challenge of infants' facial expression classification. These challenges can be overcome with our framework, which utilizes domain adaptation & online learning.
Github | |
---|---|
YBIGTA 20기 박준하 | Lead, Modeling, MLOps pipeline |
YBIGTA 20기 이주훈 | Modeling, Data acquisition |
YBIGTA 20기 정정훈 | Modeling, Dataset acquisition |
YBIGTA 21기 국호연 | MLOps pipeline, Web Backend |
YBIGTA 21기 박제연 | MLOps pipeline, Web Backend |
YBIGTA 21기 장동현 | Web Frontend, Annotation GUI |
from generic_onlinelearner import CustomTrainer
trainer = CustomTrainer()
trainer.run_full() # Train feature extractor
trainer.run_partial() # Train online learner : single image is required for model update
with open('recent_model_uri.log', 'r') as f:
model_uri = f.readline()
img_path = 'example_image.png'
prediction, uncertainty = trainer.inference_partial(path=img_path, model_uri=model_uri) # Inference for single image input
Transfer to other domain: place your images to Train Dataset, with all classes included.
(1) Collect infant face expression data
- Facial expression to 7 categories : {angry, disgust, fear, happy, sad, surprise, neutral}
- Collect Dataset through google
- Image augmentation
(2) Train Feature Extractor
- Using timm, create 'efficientnet-b0' model with 64 output layers
- Add classifying layer with 7 outputs
- Train the model with infant data
- Extract only feature extration structure and weights of the model
- Connect the Feature Extactor with scikitlearn MLP Classifier
(3) Conduct Online learning through MLFlow
- Using MLFlow, train the model with Oniline Learning method
- Online Learning enables superfast re-training without catastrophic forgetting
(4) MLops
We developed the MLops pipeline for this project mainly using ‘mlflow’, ‘Azure Machine Learning’, and aws services.
In this picture, MLops cycle goes counter-clockwise starting from the bottom-right.
The cycle consists of 3 parts: Model training, Model deploying, and Model retraining.
- Train model
-
Train ONN model with infant data
-
Track the model training with MLflow
- MLflow is a platform to manage an ML lifecycle. MLflow lets us log source properties, parameters, metrics, and artifacts related to training an ML model.
-
Save an MLflow artifact in an S3 bucket
If we train the model with MLflow, we can record MLflow entities(runs, parameters, metrics, metadata, etc.) to various remote file storage solutions.
We chose S3 Bucket for storage.
- Fetch a model information and register to Azure Model Registry
- Model registration allows you to store and version your models in the Azure cloud, in your workspace. The model registry helps you organize and keep track of your trained models.
- Get the ‘.pkl’ file of the model from the S3 bucket
- Register the ‘.pkl’ file to AzureML Model Registry by naming the model with the model version
- In AzureML, you can define an environment from a Docker image, a Docker build context, and a conda specification with a Docker image. Azure ML Environments are used to define the containers where your code will run.
- We made an environment starting from the existing environment. The environment is uploaded to ACR.
- Customize the environment by editing the DockerFile. We added some required packages.
- Use Azure Machine Learning endpoints to streamline model deployments for both real-time and batch inference deployments. Endpoints provide a unified interface to invoke and manage model deployments across compute types. An endpoint, in this context, is an HTTPS path that provides an interface for clients to send requests (input data) and receive the inferencing (scoring) output of a trained model.
- To create an online endpoint in AzureML, we need to specify four elements
- (a) Model files or a registered model in the workspace
- (b) Scoring script
- (c) Environment
- (d) Computing instance, scale setting
A. Make an inference request of one baby picture.
B. Upload a pair of baby-picture and label to retrain the model.
Action A.
- Node.js server gets an image and stores it in the s3 bucket.
- Node.js server sends the s3 bucket URI of the image to the AzureML endpoint.
- AzureML endpoint returns the inference result.
Action B.
- Node.js server stores the pair of information in the s3 bucket.
- In the local environment, keep track of the number of data in the s3 bucket. And if the number of data increases, retrains the model with the most recently stored data.
- Presentation slides: https://drive.google.com/file/d/1jGR_O3MjBp1_BM5LIkHuCcIx1niGuEpg/view?usp=drivesdk
- Presentation video https://youtu.be/0y_eo9P65h0