Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

U260924 #1586

Merged
merged 4 commits into from
Sep 26, 2024
Merged

U260924 #1586

Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion open-ai-integration/overview/product-overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: >
---

<ProductOverview
title="OpenAI Integration"
title="Conversational AI powered by Agora and OpenAI"
img="/images/real-time-stt/real-time-stt.png"
quickStartLink="/open-ai-integration/get-started/quickstart"
productFeatures={[
Expand Down Expand Up @@ -52,4 +52,5 @@ description: >
Integrating Agora’s real-time audio communication with OpenAI’s Large Language Models (LLMs) unlocks the potential for powerful, interactive voice-based applications. By combining Agora’s robust real-time audio streaming capabilities with the conversational intelligence of OpenAI’s LLMs, you can create seamless voice-enabled experiences, such as voice-powered AI assistants or interactive dialogue systems. This integration enables dynamic, responsive audio interactions, enhancing user engagement across a broad range of use cases—from customer support bots to collaborative voice-driven applications.

Most importantly, by combining the strengths of Agora and OpenAI, this integration enables the most natural form of language interaction, lowering the barrier for users to harness the power of AI and making advanced technologies more accessible than ever before.

</ProductOverview>
141 changes: 70 additions & 71 deletions shared/open-ai-integration/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,41 +24,42 @@ Follow these steps to set up your Python integration project:

1. Create a new folder for the project.

```bash
mkdir realtime-agent
cd realtime-agent/
```bash
mkdir realtime-agent
cd realtime-agent/

```
```

1. Create the following structure for your project:

```
/realtime-agent
├── __init__.py
├── .env
├── agent.py
├── agora
│   ├── __init__.py
│   ├── requirements.txt
│   └── rtc.py
└── realtimeapi
├── __init__.py
├── client.py
├── messages.py
└── util.py
```

<Admonition type="info" title="Note">
This project uses the OpenAI [`realtimeapi-examples`](https://openai.com/api/) package.Download the project and unzip it into your `realtime-agent` folder.
</Admonition>

The following descriptions provide an overview of the key files in the project:

- `agent.py`: The primary script responsible for executing the `RealtimeKitAgent`. It integrates Agora's functionality from the `agora/rtc.py` module and OpenAI's capabilities from the `realtimeapi` package.
- `agora/rtc.py`: Contains an implementation of the server-side Agora Python Voice SDK.
- `realtimeapi/`: Contains the classes and methods that interact with OpenAI’s Realtime API.

The [Complete code](#complete-integration-code) for `agent.py` and `rtc.py` is provided at the bottom of this page.
```
/realtime-agent
├── __init__.py
├── .env
├── agent.py
├── agora
│   ├── __init__.py
│   ├── requirements.txt
│   └── rtc.py
└── realtimeapi
├── __init__.py
├── client.py
├── messages.py
└── util.py
```

<Admonition type="info" title="Note">
This project uses the OpenAI [`realtimeapi-examples`](https://openai.com/api/) package.Download the project and unzip it into your
`realtime-agent` folder.
</Admonition>

The following descriptions provide an overview of the key files in the project:

- `agent.py`: The primary script responsible for executing the `RealtimeKitAgent`. It integrates Agora's functionality from the `agora/rtc.py` module and OpenAI's capabilities from the `realtimeapi` package.
- `agora/rtc.py`: Contains an implementation of the server-side Agora Python Voice SDK.
- `realtimeapi/`: Contains the classes and methods that interact with OpenAI’s Realtime API.

The [Complete code](#complete-integration-code) for `agent.py` and `rtc.py` is provided at the bottom of this page.

1. Open your `.env` file and add the following keys:

Expand All @@ -68,9 +69,6 @@ Follow these steps to set up your Python integration project:

# OpenAI API key for authentication
OPENAI_API_KEY=your_openai_api_key_here

# API base URI for the Realtime API
REALTIME_API_BASE_URI=wss://api.openai.com
```

1. Install the dependencies:
Expand All @@ -88,7 +86,8 @@ The `RealtimeKitAgent` class integrates Agora's audio communication capabilities
The `setup_and_run_agent` method sets up the `RealtimeKitAgent` by connecting to an Agora channel using the provided `RtcEngine` and initializing a session with the OpenAI Realtime API client. It sends configuration messages to set up the session and define conversation parameters, such as the system message and output audio format, before starting the agent's operations. The method uses asynchronous execution to handle both listening for the session start and sending conversation configuration updates concurrently. It ensures that the connection is properly managed and cleaned up after use, even in cases of exceptions, early exits, or shutdowns.

<Admonition type="info" title="Note">
UIDs in the Python SDK are set using a string value. Agora recommends using only numerical values for UID strings to ensure compatibility with all Agora products and extensions.
UIDs in the Python SDK are set using a string value. Agora recommends using only numerical values for UID strings to ensure compatibility
with all Agora products and extensions.
</Admonition>

```python
Expand Down Expand Up @@ -363,14 +362,14 @@ logger = logging.getLogger(**name**)

@dataclass(frozen=True, kw_only=True)
class InferenceConfig:
"""Configuration for the inference process."""
system_message: str | None = None
turn_detection: messages.TurnDetectionTypes | None = None
voice: messages.Voices | None = None
"""Configuration for the inference process."""
system_message: str | None = None
turn_detection: messages.TurnDetectionTypes | None = None
voice: messages.Voices | None = None

@dataclass(frozen=True, kw_only=True)
class LocalFunctionToolDeclaration:
"""Declaration of a tool that can be called by the model, and runs a function locally on the tool context."""
"""Declaration of a tool that can be called by the model, and runs a function locally on the tool context."""

name: str
description: str
Expand All @@ -389,7 +388,7 @@ class LocalFunctionToolDeclaration:

@dataclass(frozen=True, kw_only=True)
class PassThroughFunctionToolDeclaration:
"""Declaration of a tool that can be called by the model, and is passed through the LiveKit client."""
"""Declaration of a tool that can be called by the model, and is passed through the LiveKit client."""

name: str
description: str
Expand All @@ -411,19 +410,19 @@ ToolDeclaration = LocalFunctionToolDeclaration | PassThroughFunctionToolDeclarat

@dataclass(frozen=True, kw_only=True)
class LocalToolCallExecuted:
json_encoded_output: str
json_encoded_output: str

@dataclass(frozen=True, kw_only=True)
class ShouldPassThroughToolCall:
decoded_function_args: dict[str, Any]
decoded_function_args: dict[str, Any]

# Type alias for tool execution results

ExecuteToolCallResult = LocalToolCallExecuted | ShouldPassThroughToolCall

class ToolContext(abc.ABC):
"""Abstract base class for managing tool declarations and executions."""
_tool_declarations: dict[str, ToolDeclaration]
"""Abstract base class for managing tool declarations and executions."""
\_tool_declarations: dict[str, ToolDeclaration]

def __init__(self) -> None:
# TODO: This should be an ordered dict
Expand Down Expand Up @@ -481,18 +480,18 @@ class ToolContext(abc.ABC):
return [v.model_description() for v in self._tool_declarations.values()]

class ClientToolCallResponse(BaseModel):
tool_call_id: str
result: dict[str, Any] | str | float | int | bool | None = None
tool_call_id: str
result: dict[str, Any] | str | float | int | bool | None = None

class RealtimeKitAgent:
"""Main agent class for handling real-time communication and processing."""
engine: RtcEngine
channel: Channel
client: RealtimeApiClient
audio_queue: asyncio.Queue[bytes] = asyncio.Queue()
message_queue: asyncio.Queue[messages.ResonseAudioTranscriptionDelta] = asyncio.Queue()
message_done_queue: asyncio.Queue[messages.ResonseAudioTranscriptionDone] = asyncio.Queue()
tools: ToolContext | None = None
"""Main agent class for handling real-time communication and processing."""
engine: RtcEngine
channel: Channel
client: RealtimeApiClient
audio_queue: asyncio.Queue[bytes] = asyncio.Queue()
message_queue: asyncio.Queue[messages.ResonseAudioTranscriptionDelta] = asyncio.Queue()
message_done_queue: asyncio.Queue[messages.ResonseAudioTranscriptionDone] = asyncio.Queue()
tools: ToolContext | None = None

_client_tool_futures: dict[str, asyncio.Future[ClientToolCallResponse]]

Expand Down Expand Up @@ -690,35 +689,35 @@ class RealtimeKitAgent:
logger.warning(f"Unhandled message type: {message=}")

async def shutdown(loop, signal=None):
"""Gracefully shut down the application."""
if signal:
print(f"Received exit signal {signal.name}...")
"""Gracefully shut down the application."""
if signal:
print(f"Received exit signal {signal.name}...")

tasks = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()]

print(f"Cancelling {len(tasks)} outstanding tasks")
for task in tasks:
task.cancel()

await asyncio.gather(*tasks, return_exceptions=True)
loop.stop()

if __name__ == "__main__": # Load environment variables and run the agent
load_dotenv()
asyncio.run(
RealtimeKitAgent.entry_point(
engine=RtcEngine(appid="aab8b8f5a8cd4469a63042fcfafe7063"),
inference_config=InferenceConfig(
system_message="""\\
if **name** == "**main**": # Load environment variables and run the agent
load_dotenv()
asyncio.run(
RealtimeKitAgent.entry_point(
engine=RtcEngine(appid=os.getenv("AGORA_APP_ID")),
inference_config=InferenceConfig(
system_message="""\\
You are a helpful assistant. If asked about the weather, make sure to use the provided tool to get that information. \\
If you are asked a question that requires a tool, say something like "working on that" and don't provide a concrete response \\
until you have received the response to the tool call.\\
""",
voice=messages.Voices.Alloy,
turn_detection=messages.TurnDetectionTypes.SERVER_VAD,
),
)
)
voice=messages.Voices.Alloy,
turn_detection=messages.TurnDetectionTypes.SERVER_VAD,
),
)
)
`}

</CodeBlock>
Expand Down
Loading