Skip to content

Latest commit

 

History

History
71 lines (49 loc) · 2.86 KB

README.md

File metadata and controls

71 lines (49 loc) · 2.86 KB

Federated Learning on a multi-cluster environment powered by Liqo and Flower FL framework

The repo contains a demo of Federated Learning app deployed on a multi-cluster environment with Liqo and the popular FL framework Flower.

The deployed app is a simple ML model trained on the CIFAR-10 dataset using a simple CNN scratched with PyTorch.

Overview

The demo leverages a multi-cluster environment to run a distributed FL training. To setup the environment we need a:

  • a central cluster:
    • acts as a server
    • pilots the application (offloads client pods to the leaf clusters)
    • expose a private Service (ClusterIP) for the client to connects
    • aggregrate the results and hosts the global model
  • N leaf clusters:
    • act as clients
    • train their local model using local (sensitive) data
    • share the updated weights to server by accessing the server Service

Architecture overview:

architecture overview

Build the images

docker build -f ./build/Dockerfile.superexec -t <IMAGE_NAME>:<IMAGE_VERSION> ./demo
docker build -f ./build/Dockerfile.clientapp -t <IMAGE_NAME>:<IMAGE_VERSION> ./demo

Environment configuration

To bootstrap the environment you need 1 cluster acting as a server, and N acting as clients.

Requirements:

  • install Liqo on all clusters
  • enable the liqo RuntimeClass feature. This is not strictly required, you can keep the RuntimeClass off, but you need to modify the chart adding NodeAffinities to all deployments/statefulsets (i.e., deploy server on local nodes, while clients on liqo virtual nodes)
  • peer the central cluster (server) with all N client clusters. The central cluster acts as a consumer, while the leaf clusters acts as a provider (no bidirectional peering is required). Refer to the official docs for more info.
  • install flwr cli

Deploy the app

On the central cluster (the one acting as a server) run:

kubectl create ns flower-demo
liqoctl offload namespace flower-demo --namespace-mapping-strategy EnforceSameName
kubectl apply -f ./deploy/manifests -n flower-demo

Note: in the manifests/clients.yaml file make sure the number of replicas of the StatefulSet and the NUM_CLIENT env variable are equal to the number of clients (peered clusters).

Run the app

On the central cluster, expose the server app endpoint (port 9093):

kubectl port-forward pods/<SERVER_POD> 9093:9093

Now you are ready to run the training with:

flwr run ./demo liqo