BigBertha: A Kubernetes native LLMOps Architecture Design

BigBertha is an architecture design that demonstrates how automated LLMOps (Large Language Models Operations) can be achieved on any Kubernetes cluster using open source container-native technologies 🌟

Architecture

LLMOps Capabilities 🚀

1. LLM Monitoring

👀 BigBertha utilizes Prometheus to monitor LLM (Large Language Model) serving modules. For demo purposes, a Streamlit app is used to serve the LLM, and Prometheus scrapes metrics from it. Alerts are set up to detect performance degradation.

2. Auto-triggering LLM Retraining/Fine-tuning

⚙️ Prometheus triggers alerts when the model performance degrades. These alerts are managed by AlertManager, which uses Argo Events to trigger a retraining pipeline to fine-tune the model.

3. Training, Evaluating, and Logging the Retrained LLM

🏋️ The retraining pipeline is orchestrated using Argo Workflows. This pipeline can be tailored to perform LLM-specific retraining, fine-tuning, and metrics tracking. MLflow is used for logging the retrained LLM.

4. Triggering the Generation of New Vectors for Fresh Data

🔄 MinIO is used for unstructured data storage. Argo Events is set up to listen for upload events on MinIO, triggering a vector ingestion workflow when new data is uploaded.

5. Ingesting New Vectors into the Knowledge Base

🔍 Argo Workflows is used to run a vector ingestion pipeline that utilizes LlamaIndex for generating and ingesting vectors. These vectors are stored in Milvus, which serves as the knowledge base for retrieval-augmented generation.

Stack Overview

BigBertha relies on several key components:

ArgoCD: A Kubernetes-native continuous delivery tool that manages all components in the BigBertha stack.
Argo Workflows: A Kubernetes-native workflow engine used for running vector ingestion and model retraining pipelines.
Argo Events: A Kubernetes-native event-based dependency manager that connects various applications and components, triggering workflows based on events.
Prometheus + AlertManager: Used for monitoring and alerting related to model performance.
LlamaIndex: A framework for connecting LLMs and data sources, used for data ingestion and indexing.
Milvus: A Kubernetes-native vector database for storing and querying vectors.
MinIO: An open-source object storage system used for storing unstructured data.
MLflow: An open-source platform for managing the machine learning lifecycle, including experiment tracking and model management.
Kubernetes: The container orchestration platform that automates the deployment, scaling, and management of containerized applications.
Docker Containers: Docker containers are used for packaging and running applications in a consistent and reproducible manner.

Demo Chatbot

As a demonstration, BigBertha includes a Streamlit-based chatbot that serves a Llama2 7B quantized chatbot model. A simple Flask app is used to expose metrics, and Redis acts as an intermediary between Streamlit and Flask processes.

License 📄

This project is open-source and is governed by the terms and conditions outlined in the LICENSE file included in this repository. 📜

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
assets		assets
demo-container		demo-container
infra		infra
llmops		llmops
vector-ingestion		vector-ingestion
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BigBertha: A Kubernetes native LLMOps Architecture Design

Architecture

LLMOps Capabilities 🚀

1. LLM Monitoring

2. Auto-triggering LLM Retraining/Fine-tuning

3. Training, Evaluating, and Logging the Retrained LLM

4. Triggering the Generation of New Vectors for Fresh Data

5. Ingesting New Vectors into the Knowledge Base

Stack Overview

Demo Chatbot

License 📄

About

Releases 1

Packages

Contributors 2

Languages

License

aishwaryaprabhat/BigBertha

Folders and files

Latest commit

History

Repository files navigation

BigBertha: A Kubernetes native LLMOps Architecture Design

Architecture

LLMOps Capabilities 🚀

1. LLM Monitoring

2. Auto-triggering LLM Retraining/Fine-tuning

3. Training, Evaluating, and Logging the Retrained LLM

4. Triggering the Generation of New Vectors for Fresh Data

5. Ingesting New Vectors into the Knowledge Base

Stack Overview

Demo Chatbot

License 📄

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages