Borrowed many codes from this awesome repo.
This repo contains example code for a (very basic) Cloud ML platform.
- The ml_pipeline directory contains an example machine learning project for iris classification.
- The ml_infra directory contains Pulumi code that spins up the shared infrastructure of the ML platform, such as Kubernetes, MLFlow, etc.
As data science teams become more mature with models reaching actual production, the need for a proper infrastructure becomes crucial. Leading companies in the field with massive engineering teams like Uber, Netflix and Airbnb had created multiple solutions for their infrastructure and named the combination of them as “ML Platform”.
We hope this repo can help you get started with building your own ML platform ❤️
- FastAPI - for model serving
- MLFlow - for experiment tracking
- DVC - for data versioning
- Pulumi - Infrastructure as Code
- GitHub Actions - for CI/CD
- Traefik - API gateway
- PDM - Python dependency management
- Hydra - for parameter management
- Prefect - for workflow management
When building your own ML platform, do not take these tools for granted! Check out alternatives and find the best tools that solve each one of your problems.
Well... a lot actually. Here's a partial list:
- HTTPS & Authentication
- Data quality and governance
- Large scale data processing framework
- Feature store for offline and online scenarios
- Jupyter notebook support
- Advance workflow orchestration and scheduling
- Distributed model training support
- Model diagnostic and error analysis
- Model performance monitoring
- and probably much more!
We would love your help!
The following steps are based on Windows development environment.
-
Before You Begin:
Install
pulumi
,node
and aws tools:choco install pulumi choco install nodejs msiexec.exe /i https://awscli.amazonaws.com/AWSCLIV2.msi choco install -y aws-iam-authenticator choco install -y kubernetes-helm
Install
pdm
:(Invoke-WebRequest -Uri https://raw.githubusercontent.com/pdm-project/pdm/main/install-pdm.py -UseBasicParsing).Content | python -
-
Bring Up Infra:
We use
pulumi
to setup the infrastructure.$env:AWS_ACCESS_KEY_ID = "<YOUR_ACCESS_KEY_ID>"; $env:AWS_SECRET_ACCESS_KEY = "<YOUR_SECRET_ACCESS_KEY>" cd ml_infra npm install pulumi up
If you encounter any errors, just fix the code and run
pulumi up
again.When it succeeded, export the kubeconfig:
pulumi stack output kubeconfig > ~\.kube\config
Now we can check the k8s cluster and pods status by running command:
kubectl get no kubectl get po
-
Prepare Python Environment:
We use
pdm
to manage the Python environment:cd ml_pipeline pdm install
-
Prepare Raw Data:
Put your raw data into the
data/raw
folder, and usedvc
to manage them.pdm run dvc --cd data add raw/iris.parquet pdm run dvc --cd data push # if you are in a git repo git add raw/iris.parquet.dvc
-
Run Pipeline:
Modify the data processing, feature engineering, modeling and orchestration code. Once you are done, run the whole pipeline.
pdm run python pipeline.py
-
View Experiments:
We can check the model training results and artifacts through MLFlow. Visit http://your_domain_name/mlflow .
-
Use Prefect:
TODO
-
Deploy Model:
We use
pulumi
to build image, push to remote repository and configure the serving service.cd ml_pipeline/infra npm install pulumi up
Now you can test your model through public API: http://your_domain_name/models/<model_name>/predict .
-
CI/CD:
We provided Github actions config file in
ml_pipeline/.github
folder. Modify the workflow and push theml_pipeline
project to Github. The workflow will be automatically triggered.
You can also visit this blog post to read more details about this project.