Skip to content

Commit

Permalink
Merge pull request #24 from togethercomputer/add-quickstart
Browse files Browse the repository at this point in the history
Added quickstart MoA code + architecture diagram
  • Loading branch information
Nutlope authored Jun 24, 2024
2 parents cd19906 + 0d0bd48 commit 9138bc7
Show file tree
Hide file tree
Showing 3 changed files with 126 additions and 14 deletions.
90 changes: 76 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,98 @@
# Mixture-of-Agents Enhances Large Language Model Capabilities
# Mixture-of-Agents (MoA)

[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](LICENSE)
[![arXiv](https://img.shields.io/badge/ArXiv-2406.04692-b31b1b.svg)](https://arxiv.org/abs/2406.04692)
[![Discord](https://img.shields.io/badge/Discord-Together%20AI-blue?logo=discord&logoColor=white)](https://discord.com/invite/9Rk6sSeWEG)
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/togethercompute.svg?style=social&label=Follow%20%40togethercompute)](https://twitter.com/togethercompute)

## Overview
<a href="https://www.pdftochat.com/">
<img alt="PDFToChat – Chat with your PDFs in seconds." src="./assets/together-moa-explained.png">
</a>

<p align="center">
<a href="#overview"><strong>Overview</strong></a> ·
<a href="#quickstart:-moa-in-50-loc"><strong>Quickstart</strong></a> ·
<a href="#interactive-cli-demo"><strong>Demo</strong></a>
·
<a href="#evaluation"><strong>Evaluation</strong></a>
·
<a href="#results"><strong>Results</strong></a>
.
<a href="#credits"><strong>Credits</strong></a>
</p>

<div align="center">
<img src="assets/moa.jpg" alt="moa" style="width: 100%; display: block; margin-left: auto; margin-right: auto;" />
<br>
</div>
## Overview

Mixture of Agents (MoA) is a novel approach that leverages the collective strengths of multiple LLMs to enhance performance, achieving state-of-the-art results. By employing a layered architecture where each layer comprises several LLM agents, MoA significantly outperforms GPT-4 Omni’s 57.5% on AlpacaEval 2.0 with a score of 65.1%, using only open-source models!
Mixture of Agents (MoA) is a novel approach that leverages the collective strengths of multiple LLMs to enhance performance, achieving state-of-the-art results. By employing a layered architecture where each layer comprises several LLM agents, **MoA significantly outperforms GPT-4 Omni’s 57.5% on AlpacaEval 2.0 with a score of 65.1%**, using only open-source models!

## Quickstart: MoA in 50 LOC

To get to get started with using MoA in your own apps, see `moa.py`. You'll need to:

1. Install the Together Python library: `pip install together`
2. Get your [Together API Key](https://api.together.xyz/settings/api-keys) & export it: `export TOGETHER_API_KEY=`
3. Run the python file: `python moa.py`

```py
# Mixture-of-Agents in 50 lines of code – see moa.py
import asyncio
import os
from together import AsyncTogether, Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
async_client = AsyncTogether(api_key=os.environ.get("TOGETHER_API_KEY"))
reference_models = [
"Qwen/Qwen2-72B-Instruct",
"Qwen/Qwen1.5-72B-Chat",
"mistralai/Mixtral-8x22B-Instruct-v0.1",
"databricks/dbrx-instruct",
]
aggregator_model = "mistralai/Mixtral-8x22B-Instruct-v0.1"
aggreagator_system_prompt = "...synthesize these responses into a single, high-quality response... Responses from models:"

async def run_llm(model):
response = await async_client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "What are some fun things to do in SF?"}],
temperature=0.7,
max_tokens=100,
)
return response.choices[0].message.content

async def main():
results = await asyncio.gather(*[run_llm(model) for model in reference_models])

finalStream = client.chat.completions.create(
model=aggregator_model,
messages=[
{"role": "system", "content": aggreagator_system_prompt},
{"role": "user", "content": ",".join(str(element) for element in results)},
],
stream=True,
)

for chunk in finalStream:
print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())
```

## Interactive Demo
## Interactive CLI Demo

We first present an interactive demo. It showcases a simple multi-turn chatbot where the final response is aggregated from various reference models.
This interactive CLI demo showcases a simple multi-turn chatbot where the final response is aggregated from various reference models.

### Setup

1. Export Your API Key:

Ensure you have your Together API key and export it as an environment variable:
Ensure you have your [Together API key](https://api.together.xyz/settings/api-keys) and export it as an environment variable:

```bash
export TOGETHER_API_KEY={your_key}
```

2. Install Requirements:

```bash
pip install -r requirements.txt
```
Expand All @@ -42,13 +105,12 @@ To run the interactive demo, execute the following script with Python:
python bot.py
```

The script will prompt you to input instructions interactively. Here's how to use it:
The CLI will prompt you to input instructions interactively:

1. Start by entering your instruction at the ">>>" prompt.
2. The system will process your input using the predefined reference models.
3. It will generate a response based on the aggregated outputs from these models.
4. You can continue the conversation by inputting more instructions, with the system maintaining the context of the multi-turn interaction.
5. enter `exit` to exit the chatbot.

### Configuration

Expand All @@ -65,7 +127,7 @@ You can configure the demo by specifying the following parameters:
## Evaluation

We provide scripts to quickly reproduce some of the results presented in our paper
For convenience, we have included the code from [AlpacaEval](https://github.com/tatsu-lab/alpaca_eval),
For convenience, we have included the code from [AlpacaEval](https://github.com/tatsu-lab/alpaca_eval),
[MT-Bench](https://github.com/lm-sys/FastChat), and [FLASK](https://github.com/kaistAI/FLASK), with necessary modifications.
We extend our gratitude to these projects for creating the benchmarks.

Expand Down
Binary file added assets/together-moa-explained.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
50 changes: 50 additions & 0 deletions moa.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Mixture-of-Agents in 50 lines of code
import asyncio
import os
from together import AsyncTogether, Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
async_client = AsyncTogether(api_key=os.environ.get("TOGETHER_API_KEY"))

user_prompt = "What are some fun things to do in SF?"
reference_models = [
"Qwen/Qwen2-72B-Instruct",
"Qwen/Qwen1.5-72B-Chat",
"mistralai/Mixtral-8x22B-Instruct-v0.1",
"databricks/dbrx-instruct",
]
aggregator_model = "mistralai/Mixtral-8x22B-Instruct-v0.1"
aggreagator_system_prompt = """You have been provided with a set of responses from various open-source models to the latest user query. Your task is to synthesize these responses into a single, high-quality response. It is crucial to critically evaluate the information provided in these responses, recognizing that some of it may be biased or incorrect. Your response should not simply replicate the given answers but should offer a refined, accurate, and comprehensive reply to the instruction. Ensure your response is well-structured, coherent, and adheres to the highest standards of accuracy and reliability.
Responses from models:"""


async def run_llm(model):
"""Run a single LLM call with a reference model."""
response = await async_client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": user_prompt}],
temperature=0.7,
max_tokens=512,
)
print(model)
return response.choices[0].message.content


async def main():
results = await asyncio.gather(*[run_llm(model) for model in reference_models])

finalStream = client.chat.completions.create(
model=aggregator_model,
messages=[
{"role": "system", "content": aggreagator_system_prompt},
{"role": "user", "content": ",".join(str(element) for element in results)},
],
stream=True,
)

for chunk in finalStream:
print(chunk.choices[0].delta.content or "", end="", flush=True)


asyncio.run(main())

0 comments on commit 9138bc7

Please sign in to comment.