zkDML: Zero Knowledge Distributed Machine Learning #6353

wertikalk · 2024-10-25T13:21:21Z

wertikalk
Oct 25, 2024

Summary

Problem Overview:

Various organizations, like medical institutions, schools, or corporate businesses own large amounts of data that is considered highly valuable for creating Machine Learning (ML) models. However, the nature of data, which may be sensitive personal data or corporate secrets restricts other actors from benefiting from it.

Solution Description:

We propose an interactive verifiable protocol powered by Zero Knowledge (ZK) Proofs and Multi-Party Computation (MPC) called zkDML(Zero Knowledge Distributed Machine Learning). This new custom protocol will enable organizations to assist clients in their intent to generate an Artificial Neural Network (ANN) model on private data without leaking any sensitive information.

Building the initial zkDML protocol version will enable us to implement other variations on it where data privacy is not so important, but the scaling of the complete training infrastructure is.

Methodology

Protocol Overview:

There are two main actors in our custom zkDML system:

Organization holding the private data
Client that creates the ML model which will operate on the Organization's data

Protocol Steps:

The Organization selects the Dataset that will be available for training and publicly commits to it
Client consults the Organization’s API to learn about available datasets and to perform all of the necessary configuration setup
Client defines the ANN’s structure and the initial set of model parameters
Client sends model parameters to the Organization
Organization uses a cluster of workers and performs the training of the model on it
- During the training, each worker is assigned a data subset for which it computes the total prediction error and the corresponding proof that it was computed correctly, aggregates them and sends all the information to the Client who performs backpropagation, updating the model parameters
Until convergence, the Client repeats steps 4-6

Implementation details and Technology stack:

We will implement the Client library and UI, Organization API service, MPC aggregator, and workers using Typescript. Proving the correctness of the workers’ computations will be implemented using Noir language, producing PLONK proofs. The Noir scripts will be improvements and modifications of our already implemented library called SKProof where we implemented Multilayer Perceptron (MLP) inference verification using Noir language. Finally, as most of the data science and machine learning is done using Python language, we will port the client library to Python also. The architecture of our system is show on Figure 1.

Figure 1. zkDML Architecture

Assumptions

For the Proof of Concept stage, we assume that:

the system will initially operate with Fixed Point Arithmetic values, but we will later introduce Floating Point Arithmetic
the Organization which holds the dataset, guarantees its correctness

Timeline and Deliverables

Milestone 1 (1.5 month)
- Organization API (Worker and MPC Aggregator services)
Milestone 2 (1 month)
- Client (Training Library and UI)
Milestone 3 (2 weeks)
- Finishing up integrations between Organization and Client side

Team

We are part of: MVP Workshop - Blockchain Product Research & Development Studio and its 3327 R&D department.

Aleksandar Veljkovic PhD (@aleksandar-veljkovic)
Working in software engineering and systems architecture since 2014, in the Web 3.0 domain as a researcher, architect, and engineer since 2018. Currently a senior researcher in R&D team 3327 at Attic 42 in the domain of cryptography, decentralization, and zero-knowledge. Former teaching assistant at the University of Belgrade, Faculty of Mathematics. Worked on the implementation of MACI El Gamal protocol modifications and MACI poll joining protocol implementation.

Milos Bojinovic MSc (@wertikalk)
Received his BSc Degree in Electrical Engineering and Computing from the School of Electrical Engineering, University of Belgrade, Serbia in 2020 where he also completed his MSc studies in 2023. Went from Digital Design Verification to Software development and Web3. He worked as a Research Engineer in team 3327 at Attic42 for a year before moving to his current role - Smart Contract Architect & Developer at the same company. Contributed to several open-source projects(Filecoin Solidity and Curvy Protocol). At the present, he focuses on the security research of the application level of different blockchain ecosystems.

Mihailo Radojevic (@radojevicMihailo)
MSc student at the University of Belgrade, School of Electrical Engineering. Has been working in the domain of Zero-Knowledge and Cryptography for a year. Currently working as an engineer in team 3327 at Attic42 in the domain of Zero-Knowledge Machine Learning. Worked on MACI poll joining protocol implementation.

Related Work

SKProof Python library enables generation of execution proofs for machine learning models found in scikit-learn library. The circuits are designed using Noir language and the PLONK proofs are generated internally using Nargo CLI tool.
ZKFLoat Noir library that adds support for representing floating point values and performing a basic arithmetic operations over floating point numbers.

Start Date

November 11th 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

noir-lang

zkDML: Zero Knowledge Distributed Machine Learning #6353

{{title}}

Replies: 0 comments

Select a reply

noir-lang

zkDML: Zero Knowledge Distributed Machine Learning #6353

wertikalk Oct 25, 2024

Summary

Problem Overview:

Solution Description:

Methodology

Protocol Overview:

Protocol Steps:

Implementation details and Technology stack:

Assumptions

Timeline and Deliverables

Team

Related Work

Start Date

Replies: 0 comments

wertikalk
Oct 25, 2024