Skip to content

Latest commit

 

History

History
79 lines (63 loc) · 2.32 KB

README.md

File metadata and controls

79 lines (63 loc) · 2.32 KB

Vantage6 algorithm for k-means

This algorithm was designed for the vantage6 architecture.

Input data

The data nodes should hold a csv with variables following the same common data model. We split the iris dataset and provide as an example to test the code.

Using the algorithm

Below you can see an example of how to run the algorithm:

import time
from vantage6.client import Client

# Initialise the client
client = Client('http://127.0.0.1', 5000, '/api')
client.authenticate('username', 'password')
client.setup_encryption(None)

# Define algorithm input
input_ = {
    'method': 'master',
    'master': True,
    'kwargs': {
        'org_ids': [2, 3],      # organisations to run kmeans
        'k': 3,                 # number of clusters to compute
        'epsilon': 0.05,        # threshold for convergence criterion
        'max_iter': 300,        # maximum number of iterations to perform
        'columns': [            # columns to be used for clustering
            'sepal_length', 'sepal_width', 'petal_length', 'petal_width'
        ],
        'd_init': 'all',        # data nodes to use for centroids initialisation
        'init_method': 'random',# method for centroids initialisation
        'avg_method': 'k-means' # method used to get global centroids
    }
}

# Send the task to the central server
task = client.task.create(
    collaboration=1,
    organizations=[2, 3],
    name='v6-kmeans-py',
    image='ghcr.io/maastrichtu-cds/v6-kmeans-py:latest',
    description='run kmeans',
    input=input_,
    data_format='json'
)

# Retrieve the results
task_info = client.task.get(task['id'], include_results=True)
while not task_info.get('complete'):
    task_info = client.task.get(task['id'], include_results=True)
    time.sleep(1)
result_info = client.result.list(task=task_info['id'])
results = result_info['data'][0]['result']

Testing locally

If you wish to test the algorithm locally, you can create a Python virtual environment, using your favourite method, and do the following:

source .venv/bin/activate
pip install -e .
python v6_kmeans_py/example.py

The algorithm was developed and tested with Python 3.7.

Acknowledgments

This project was financially supported by the AiNed foundation.