Skip to content

hicder/muopdb

Repository files navigation

MuopDB - A vector database for machine learning

Introduction

MuopDB is a vector database for machine learning. Currently, it supports:

  • Index type: HNSW, IVF, SPANN. All on-disk with mmap.
  • Quantization: product quantization

Here are the plans for future MuopDB:

V0 (Done)

  • Query path
    • Vector similarity search
    • Hierarchical Navigable Small Worlds (HNSW)
    • Product Quantization (PQ)
  • Indexing path
    • Support periodic offline indexing
  • Database Management
    • Doc-sharding & query fan-out with aggregator-leaf architecture
    • In-memory & disk-based storage with mmap

V1 (Done)

  • Query & Indexing
    • Inverted File (IVF)
    • Improve locality for HNSW
    • SPANN

V2

  • Query
    • Multiple index segments
    • Support realtime indexing
    • Elias-Fano encoding for IVF
  • Quantization
    • RabitQ

Why MuopDB?

This is an educational project for me to learn Rust & vector database.

Building

Install prerequisites:

# macos
brew install hdf5 protobuf

export HDF5_DIR="$(brew --prefix hdf5)"

Build:

# from top-level workspace
cargo build --release

Test:

cargo test --release

Contributions

This project is done with TechCare Coaching. I am mentoring mentees who made contributions to this project.

About

MuopDB - A Vector Database

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •