Skip to content

Commit

Permalink
update lints with prettier
Browse files Browse the repository at this point in the history
  • Loading branch information
Copybara authored and Copybara committed Jan 7, 2025
1 parent 11e12fe commit 096ef6c
Show file tree
Hide file tree
Showing 63 changed files with 4,459 additions and 3,965 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,4 @@ In the future, we look forward to your patches and contributions to this project
All submissions, including submissions by project members, require review. We
use GitHub pull requests for this purpose. Consult
[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
information on using pull requests.
information on using pull requests.
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ This repository serves as a comprehensive developer guide for [Google's Gemini M
## What You'll Learn

By following this guide, you'll be able to:

- Build real-time audio chat applications with Gemini
- Implement live video interactions through webcam and screen sharing
- Create multimodal experiences combining audio and video
Expand All @@ -18,18 +19,21 @@ The guide progresses from basic concepts to advanced implementations, culminatin
## Key Concepts Covered

- **Real-time Communication:**

- WebSocket-based streaming
- Bidirectional audio chat
- Live video processing
- Turn-taking and interruption handling

- **Audio Processing:**

- Microphone input capture
- Audio chunking and streaming
- Voice Activity Detection (VAD)
- Real-time audio playback

- **Video Integration:**

- Webcam and screen capture
- Frame processing and encoding
- Simultaneous audio-video streaming
Expand All @@ -45,20 +49,26 @@ The guide progresses from basic concepts to advanced implementations, culminatin
## Guide Structure

### [Part 1](part_1_intro): Introduction to Gemini's Multimodal Live API

Basic concepts and SDK usage:

- SDK setup and authentication
- Text and audio interactions
- Real-time audio chat implementation

### [Part 2](part_2_dev_api): WebSocket Development with [Gemini Developer API](https://ai.google.dev/api/multimodal-live)

Direct WebSocket implementation, building towards Project Pastra - a production-ready multimodal AI assistant inspired by Project Astra:

- Low-level WebSocket communication
- Audio and video streaming
- Function calling and system instructions
- Mobile-first deployment

### [Part 3](part_3_vertex_api): WebSocket Development with [Vertex AI API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-live)

Enterprise-grade implementation using Vertex AI, mirroring Part 2's journey with production-focused architecture:

- Proxy-based authentication
- Service account integration
- Cloud deployment architecture
Expand All @@ -68,14 +78,14 @@ Enterprise-grade implementation using Vertex AI, mirroring Part 2's journey with

Below is a comprehensive overview of where each feature is implemented across the Development API and Vertex AI versions:

| Feature | Part 1 - Intro Chapter | Part 2 - Dev API Chapter | Part 3 - Vertex AI Chapter |
|---------|----------------|----------------|-------------------|
| SDK setup and authentication | [Chapter 1](part_1_intro/chapter_01) | - | - |
| Text and audio interactions | [Chapter 1](part_1_intro/chapter_01) | - | - |
| Real-time Audio Chat | [Chapter 2](part_1_intro/chapter_02) | [Chapter 5](part_2_dev_api/chapter_05) | [Chapter 9](part_3_vertex_api/chapter_09) |
| Multimodal (Audio + Video) | - | [Chapter 6](part_2_dev_api/chapter_06) | [Chapter 10](part_3_vertex_api/chapter_10) |
| Function Calling & Instructions | - | [Chapter 7](part_2_dev_api/chapter_07) | [Chapter 11](part_3_vertex_api/chapter_11) |
| Production Deployment | - | [Chapter 8](part_2_dev_api/chapter_08) | [Chapter 12](part_3_vertex_api/chapter_12) |
| Feature | Part 1 - Intro Chapter | Part 2 - Dev API Chapter | Part 3 - Vertex AI Chapter |
| ------------------------------- | ------------------------------------ | -------------------------------------- | ------------------------------------------ |
| SDK setup and authentication | [Chapter 1](part_1_intro/chapter_01) | - | - |
| Text and audio interactions | [Chapter 1](part_1_intro/chapter_01) | - | - |
| Real-time Audio Chat | [Chapter 2](part_1_intro/chapter_02) | [Chapter 5](part_2_dev_api/chapter_05) | [Chapter 9](part_3_vertex_api/chapter_09) |
| Multimodal (Audio + Video) | - | [Chapter 6](part_2_dev_api/chapter_06) | [Chapter 10](part_3_vertex_api/chapter_10) |
| Function Calling & Instructions | - | [Chapter 7](part_2_dev_api/chapter_07) | [Chapter 11](part_3_vertex_api/chapter_11) |
| Production Deployment | - | [Chapter 8](part_2_dev_api/chapter_08) | [Chapter 12](part_3_vertex_api/chapter_12) |

Note: Vertex AI implementation starts directly with advanced features, skipping basic WebSocket and text-to-speech examples.

Expand All @@ -94,13 +104,15 @@ Note: Vertex AI implementation starts directly with advanced features, skipping
## Key Differences Between Dev API and Vertex AI

### Development API (Part 2)

- Simple API key authentication
- Direct WebSocket connection
- All tools available simultaneously
- Single-service deployment
- Ideal for prototyping and learning

### Vertex AI (Part 3)

- Service account authentication
- Proxy-based architecture
- Single tool limitation
Expand All @@ -117,4 +129,3 @@ Note: Vertex AI implementation starts directly with advanced features, skipping
## License

This project is licensed under the Apache License.

Original file line number Diff line number Diff line change
Expand Up @@ -5,31 +5,36 @@ This section provides a foundational introduction to working with Google's Gemin
## Contents

### Chapter 1: SDK Basics

- Introduction to the Google Gemini AI SDK
- Setting up the development environment
- Basic text interactions with Gemini
- Audio response generation examples
- Examples using both direct API key authentication and Vertex AI authentication

### Chapter 2: Multimodal Interactions

- Real-time audio conversations with Gemini
- Streaming audio input and output
- Voice activity detection and turn-taking
- Example implementation of an interactive voice chat

## Key Features Covered

- Text generation and conversations
- Audio output generation
- Real-time streaming interactions
- Different authentication methods (API key and Vertex AI)
- Multimodal capabilities (text-to-audio, audio-to-audio)

## Prerequisites

- Python environment
- Google Gemini API access
- Required packages:
- `google-genai`
- `pyaudio` (for audio examples)

## Getting Started
Each chapter contains Jupyter notebooks and Python scripts that demonstrate different aspects of the Gemini AI capabilities. Start with Chapter 1's notebooks for basic SDK usage, and then move on to the more advanced multimodal examples in Chapter 2.

Each chapter contains Jupyter notebooks and Python scripts that demonstrate different aspects of the Gemini AI capabilities. Start with Chapter 1's notebooks for basic SDK usage, and then move on to the more advanced multimodal examples in Chapter 2.
Loading

0 comments on commit 096ef6c

Please sign in to comment.