ClusOpt Core

This package is used by ClusOpt for it's CPU intensive tasks, but it can be easily imported in any python data stream clustering project, it is coded mainly in C/C++ with bindings for python, and features:

CluStream (based on MOA implementation)
StreamKM++ (wrapped around the original paper authors implementation)
Distance Matrix computation (in place implementation using boost threads)
Silhouette score (custom in place implementation inspired by BIRCH clustering vector)

Prerequisites

python >= 3.6
pip
boost-thread
gcc >= 6

boost-thread can be installed in Debian based systems with :

apt install libboost-thread-dev

Usage

See examples folder for more.

CluStream online clustering

from clusopt_core.cluster import CluStream
from sklearn.datasets import make_blobs
import numpy as np
import matplotlib.pyplot as plt

k = 32

dataset, _ = make_blobs(n_samples=64000, centers=k, random_state=42, cluster_std=0.1)

model = CluStream(
    m=k * 10,  # no microclusters
    h=64000,  # horizon
    t=2,  # radius factor
)

chunks = np.split(dataset, len(dataset) / 4000)

model.init_offline(chunks.pop(0), seed=42)

for chunk in chunks:
    model.partial_fit(chunk)

clusters, _ = model.get_macro_clusters(k, seed=42)

plt.scatter(*dataset.T, marker=",", label="datapoints")

plt.scatter(*model.get_partial_cluster_centers().T, marker=".", label="microclusters")

plt.scatter(*clusters.T, marker="x", label="macro clusters", color="black")

plt.legend()
plt.show()

output:

Benchmarks

Some functions in clusopt_core are faster than scikit learn implementations, see the benchmark folder for more info.

Silhouette

Each bar have a tuple of (no_samples,dimension,no_groups), so independently of those 3 factors, clusopt implementation is faster.

Distance Matrix

Each bar shows the dataset dimension, so clusopt_core implemetation is faster when the dataset dimension is small (<~150), even when using 4 processes in scikit-learn.

Installation

You can install it directly from pypi with

pip install clusopt-core

or you can clone this repo and install from the directory

pip install ./clusopt_core

Acknowledgments

Thanks to:

Marcel R. Ackermann et al. for the StreamKM++ algorithm - link
The university of Waikato for the MOA framework - link

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
benchmark		benchmark
clusopt_core		clusopt_core
examples		examples
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClusOpt Core

Prerequisites

Usage

CluStream online clustering

Benchmarks

Silhouette

Distance Matrix

Installation

Acknowledgments

Thanks to:

About

Releases 9

Packages

Languages

License

giuliano-macedo/clusopt_core

Folders and files

Latest commit

History

Repository files navigation

ClusOpt Core

Prerequisites

Usage

CluStream online clustering

Benchmarks

Silhouette

Distance Matrix

Installation

Acknowledgments

Thanks to:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Languages

Packages