TM Scheduler

The Tadashi-Mizu distributed task scheduler

ONLY TESTED ON MACOS ARM64

Code is still 100% a work in progress and is mostly scratch code wiring things up (I don't really know C++ sorry :P)

Communication Architecture

           Clients 
   (python or http client)                       
    ┌─────┐┌─────┐┌─────┐                 
    │     ││     ││     │                 
    │     ││     ││     │                 
    └──┬──┘└──┬──┘└──┬──┘                 
┌──────┼──────┼──────┼───────────────────┐
│  ┌───▼──────▼──────▼────────────────┐  │
│  │           CONTROL PLANE          │  │
│  │            API SERVER            │  │
│  └──────┬────────────────────┬──────┘  │
│         │                    │         │
│  ┌──────▼───────┐     ┌──────▼──────┐  │
│  │              │     │             │  │
│  │   KV STORE   │     │             │  │
│  │              │     │             │  │
│  └──────┬───────┘     │    OBJECT   │  │
│         │             │    STORE    │  │
│  ┌──────▼───────┐     │             │  │
│  │              │     │             │  │
│  │  SCHEDULER   │     │             │  │
│  │              │     │             │  │
│  └──┬───────────┘     └─────┬───────┘  │
│     │                       │          │
└─────┼───────────────────────┼──────────┘
      │-RPC                   │           
   ┌──▼────────────┐          │           
   │┌───────────────┐         │           
   ││ ┌───────────────┐       │           
   ││ │               │       │           
   ││ │               │       │           
   ││ │ WORKER NODES  ◄───────┘           
   └│ │               │                   
    └─│               │                   
      └───────────────┘

Specification planning

Many worker nodes processing partitioned data. Inspiration: https://duckdb.org/2022/03/07/aggregate-hashtable.html
Need internal data store for blob storage? Maybe run against s3 compatible system to pull columnar data in to arrow format??? IDK.
Python bindings for control plane API & task submission.
Task data uses Apache Arrow for in memory format.
Worker nodes will communicate with scheduler via RPC and visa-versa.
Uses RocksDB for embedded key value store for metadata.

Ideas for client-side task submission

import TmScheduler as tm

#Connect to control plane
tm.NewSession("localhost:8080")

#Register three worker nodes to the control plane
tm.RegisterWorker("localhost:50051")
tm.RegisterWorker("localhost:50052")
tm.RegisterWorker("localhost:50053")

#Register an s3 object store
tm.RegisterObjectStore("s3://test-bucket")

# TODO: implement some ideas for submitting tasks

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
protos/api/v1		protos/api/v1
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TM Scheduler

Communication Architecture

Specification planning

Ideas for client-side task submission

To do list

About

Releases

Packages

Languages

License

devanbenz/tm-scheduler

Folders and files

Latest commit

History

Repository files navigation

TM Scheduler

Communication Architecture

Specification planning

Ideas for client-side task submission

To do list

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages