-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added Gemini Multimodal Live API Developer Guide #160
base: main
Are you sure you want to change the base?
added Gemini Multimodal Live API Developer Guide #160
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi team,
I'm currently reviewing this pull request and will post my full review in a few minutes. In the meantime, here's a quick summary for you and other reviewers to get up to speed:
This pull request adds a Gemini Multimodal Live API Developer Guide to the applied-ai-engineering-samples
repository. The guide is intended to provide a comprehensive, hands-on learning experience for developers looking to build real-time applications using Gemini's multimodal capabilities. The guide includes multiple parts:
- Part 1: Introduction to Gemini's Multimodal Live API (basic concepts and SDK usage)
- Part 2: WebSocket Development with Gemini Developer API (building a production-ready multimodal AI assistant)
- Part 3: WebSocket Development with Vertex AI API (enterprise-grade implementation)
The changes include:
- Addition of a new directory:
genai-on-vertex-ai/gemini_2_0/gemini-multimodal-live-api-dev-guide
- Creation of several Markdown files explaining the project and its different parts
- Creation of a
.gitignore
file to manage ignored files - Addition of a
CONTRIBUTING.md
file - Addition of a
LICENSE
file - Addition of several Jupyter notebooks and Python scripts to demonstrate different aspects of the Gemini AI capabilities
I haven't been able to fully assess the correctness of the code examples yet, but the overall structure and organization of the guide seem well-planned. I'll provide more detailed feedback in my full review.
Here's a little haiku to lighten the mood:
Code flows like a stream,
Reviews guide it to the sea,
Perfect, polished code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request adds a Gemini Multimodal Live API Developer Guide, which is a great addition. The documentation is comprehensive and well-structured, covering various aspects of building real-time multimodal applications with Gemini. The README files provide clear explanations and the patches include necessary files like .gitignore, CONTRIBUTING.md, and LICENSE. I have a few suggestions based on Google style guides and general best practices that would enhance clarity and maintainability.
Summary of Google style guide elements referenced:
- Filenames should be lowercase with hyphens as word separators.
- Use descriptive names for functions and variables.
- Comments should be clear and concise.
- Use sentence-style capitalization for commit messages.
genai-on-vertex-ai/gemini_2_0/gemini-multimodal-live-api-dev-guide/README.md
Outdated
Show resolved
Hide resolved
...-vertex-ai/gemini_2_0/gemini-multimodal-live-api-dev-guide/part_1_intro/chapter_02/README.md
Outdated
Show resolved
Hide resolved
...ai/gemini_2_0/gemini-multimodal-live-api-dev-guide/part_1_intro/chapter_02/audio-to-audio.py
Outdated
Show resolved
Hide resolved
genai-on-vertex-ai/gemini_2_0/gemini-multimodal-live-api-dev-guide/CONTRIBUTING.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi team,
I'm currently reviewing this pull request and will post my full review in a few minutes. In the meantime, here's a quick summary for you and other reviewers to get up to speed:
This pull request adds a Gemini Multimodal Live API Developer Guide to the applied-ai-engineering-samples
repository. The guide is designed to help developers build real-time applications that can see, hear, and interact naturally using Gemini's multimodal capabilities.
Here's a breakdown of the changes:
- New Directory: A new directory,
genai-on-vertex-ai/gemini_2_0/gemini-multimodal-live-api-dev-guide
, has been created to house the guide's content. - .gitignore: A comprehensive
.gitignore
file has been added to exclude common build artifacts and temporary files. - CONTRIBUTING.md: A
CONTRIBUTING.md
file has been added, outlining the contribution guidelines (currently indicating that contributions are not being accepted at this time). - LICENSE: An Apache 2.0 License file has been included.
- README.md: A
README.md
file provides an overview of the guide, its structure, key concepts covered, and prerequisites. - Part 1 (Introduction): This section introduces basic concepts and SDK usage, covering text and audio interactions.
- Part 2 (Gemini Developer API): This section focuses on WebSocket development using the Gemini Developer API, building towards a production-ready multimodal AI assistant.
- Part 3 (Vertex AI API): This section demonstrates the same functionalities as Part 2, but using the Vertex AI API.
- Individual Chapters: Each part contains several chapters with detailed explanations, code examples, and Jupyter notebooks.
The guide covers a wide range of topics, including real-time communication, audio processing, video integration, and production deployment considerations.
I hope this summary is helpful. Let me know if you have any questions.
And now, a little haiku to brighten your day:
Code flows like a stream,
Reviews guide the path to clear,
Gems of code emerge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request adds a Gemini Multimodal Live API Developer Guide, which is a valuable contribution. The documentation is comprehensive and well-structured, covering various aspects of building real-time multimodal applications with Gemini. The README files provide clear explanations, helpful guidance, and a good overview of the project. The inclusion of a .gitignore file is also a good practice.
I've reviewed the changes with a focus on correctness, efficiency, maintainability, and security, considering common best practices and language idioms. I also checked for adherence to standard style guides, leveraging Markdown for better readability.
Here's a summary of the style guide elements I referenced, which are common across multiple style guides:
- File naming: Descriptive filenames in lowercase with hyphens.
- Headings: Sentence-case for headings.
- Lists: Hyphens for unordered lists.
- Links: Descriptive link text.
- Code blocks: Backticks for inline code and triple backticks for code blocks.
- Images: Descriptive alt text for images.
A few suggestions are provided below to further enhance clarity, consistency, and address potential areas for improvement.
| Feature | Part 2 - Dev API Chapter | Part 3 - Vertex AI Chapter | | ||
|---------|----------------|-------------------| | ||
| Basic WebSocket Setup | [Chapter 3](part_2_dev_api/chapter_03) | - | | ||
| Text-to-Speech | [Chapter 4](part_2_dev_api/chapter_04) | - | | ||
| Real-time Audio Chat | [Chapter 5](part_2_dev_api/chapter_05) | [Chapter 9](part_3_vertex_api/chapter_09) | | ||
| Multimodal (Audio + Video) | [Chapter 6](part_2_dev_api/chapter_06) | [Chapter 10](part_3_vertex_api/chapter_10) | | ||
| Function Calling & Instructions | [Chapter 7](part_2_dev_api/chapter_07) | [Chapter 11](part_3_vertex_api/chapter_11) | | ||
| Production Deployment (Project Pastra)| [Chapter 8](part_2_dev_api/chapter_08) | [Chapter 12](part_3_vertex_api/chapter_12) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table would be more readable with aligned columns. Most Markdown style guides recommend aligning columns in tables for better readability.
| Feature | Part 2 - Dev API Chapter | Part 3 - Vertex AI Chapter | | |
|---------|----------------|-------------------| | |
| Basic WebSocket Setup | [Chapter 3](part_2_dev_api/chapter_03) | - | | |
| Text-to-Speech | [Chapter 4](part_2_dev_api/chapter_04) | - | | |
| Real-time Audio Chat | [Chapter 5](part_2_dev_api/chapter_05) | [Chapter 9](part_3_vertex_api/chapter_09) | | |
| Multimodal (Audio + Video) | [Chapter 6](part_2_dev_api/chapter_06) | [Chapter 10](part_3_vertex_api/chapter_10) | | |
| Function Calling & Instructions | [Chapter 7](part_2_dev_api/chapter_07) | [Chapter 11](part_3_vertex_api/chapter_11) | | |
| Production Deployment (Project Pastra)| [Chapter 8](part_2_dev_api/chapter_08) | [Chapter 12](part_3_vertex_api/chapter_12) | | |
| Feature | Part 2 - Dev API Chapter | Part 3 - Vertex AI Chapter | | |
|---------|--------------------------|--------------------------| | |
| Basic WebSocket Setup | [Chapter 3](part_2_dev_api/chapter_03) | - | | |
| Text-to-Speech | [Chapter 4](part_2_dev_api/chapter_04) | - | | |
| Real-time Audio Chat | [Chapter 5](part_2_dev_api/chapter_05) | [Chapter 9](part_3_vertex_api/chapter_09) | | |
| Multimodal (Audio + Video) | [Chapter 6](part_2_dev_api/chapter_06) | [Chapter 10](part_3_vertex_api/chapter_10) | | |
| Function Calling & Instructions | [Chapter 7](part_2_dev_api/chapter_07) | [Chapter 11](part_3_vertex_api/chapter_11) | | |
| Production Deployment (Project Pastra)| [Chapter 8](part_2_dev_api/chapter_08) | [Chapter 12](part_3_vertex_api/chapter_12) | |
genai-on-vertex-ai/gemini_2_0/gemini-multimodal-live-api-dev-guide/part_1_intro/README.md
Outdated
Show resolved
Hide resolved
- Chapter 4: http://localhost:8000/chapter_04/ | ||
And so on... | ||
|
||
3. Begin with Chapter 3 to understand the fundamentals of WebSocket communication with Gemini. Each subsequent chapter builds upon previous concepts, gradually introducing more complex features and capabilities. By Chapter 8, you'll have transformed the development prototype into Project Pastra - a production-ready AI assistant that demonstrates the future of human-AI interaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence could be improved for clarity and conciseness. Consider splitting it into two shorter sentences, which is generally recommended for better readability.
3. Begin with Chapter 3 to understand the fundamentals of WebSocket communication with Gemini. Each subsequent chapter builds upon previous concepts, gradually introducing more complex features and capabilities. By Chapter 8, you'll have transformed the development prototype into Project Pastra - a production-ready AI assistant that demonstrates the future of human-AI interaction. | |
Begin with Chapter 3 to understand the fundamentals of WebSocket communication with Gemini. Each subsequent chapter builds upon previous concepts, gradually introducing more complex features and capabilities. By Chapter 8, you'll have built Project Pastra, a production-ready AI assistant that demonstrates the future of human-AI interaction. |
@@ -0,0 +1,36 @@ | |||
# How to Contribute | |||
|
|||
At this time, we're not looking to take contributions to this project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the project isn't accepting contributions right now, it's good to keep the CONTRIBUTING.md welcoming. Perhaps suggest how people can still engage, like reporting issues or discussing ideas.
At this time, we're not looking to take contributions to this project. | |
While we are not actively accepting code contributions at this time, we welcome your feedback, bug reports, and feature requests. Please open an issue to share your thoughts or report any problems you encounter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi team,
I'm currently reviewing this pull request and will post my full review in a few minutes. In the meantime, here's a quick summary for you and other reviewers to get up to speed:
This pull request adds a Gemini Multimodal Live API Developer Guide to the applied-ai-engineering-samples
repository. The guide is designed to help developers build real-time applications that can see, hear, and interact naturally using Gemini's multimodal capabilities.
Here's a breakdown of the changes:
- New Directory: A new directory,
genai-on-vertex-ai/gemini_2_0/gemini-multimodal-live-api-dev-guide
, has been created to house the guide's content. - .gitignore: A comprehensive
.gitignore
file has been added to manage ignored files and directories within the guide's directory. - CONTRIBUTING.md: A
CONTRIBUTING.md
file has been added, outlining the contribution process (currently contributions are not being accepted). - LICENSE: An Apache 2.0 License file has been added.
- README.md: A
README.md
file provides an overview of the guide, including what you'll learn, key concepts covered, guide structure, feature implementation, prerequisites, and key differences between the Development API and Vertex AI implementations. - Part 1 (Introduction): This section provides a foundational introduction to working with Google's Gemini AI model.
- Part 2 (WebSocket Development with Gemini API): This section demonstrates how to work directly with the Gemini API using WebSockets, progressively building towards Project Pastra.
- Part 3 (WebSocket Development with Vertex AI API): This section demonstrates how to work with Vertex AI using WebSockets, progressively building towards Project Pastra.
- Chapters: Each part contains multiple chapters with detailed instructions and code examples.
I haven't fully reviewed the code yet, but this summary should give you a good starting point. I'll provide more detailed feedback shortly.
Here's a little poem to brighten your day:
Code flows like a river,
Tests like rapids, strong and swift,
Reviews guide the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request adds a Gemini Multimodal Live API Developer Guide, which is a great addition. The documentation is well-written and comprehensive. The structure of the guide, progressing from basic concepts to advanced implementations, including both Development API and Vertex AI versions, is logical and helpful for developers. The Feature Implementation Guide and the Key Differences Between Dev API and Vertex AI sections are particularly useful for navigating the different implementations. A few minor suggestions below.
Summary of Google Python Style Guide elements referenced:
- Docstrings should follow the Google Python Style Guide.
- Use descriptive names for variables, functions, and classes.
@@ -0,0 +1,36 @@ | |||
# How to Contribute | |||
|
|||
At this time, we're not looking to take contributions to this project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While contributions may not be actively sought at this time, it's generally good practice to keep contribution guidelines open-ended. This encourages future contributions and provides a clear process if someone does want to contribute. Consider rephrasing to something like, "We welcome contributions! At this time, we are prioritizing X, Y, and Z, so contributions in those areas may be reviewed more quickly. Please ensure all contributions adhere to the following guidelines:"
At this time, we're not looking to take contributions to this project. | |
We welcome contributions! At this time, we are prioritizing improvements to the documentation and examples, so contributions in those areas may be reviewed more quickly. Please ensure all contributions adhere to the following guidelines: |
| Feature | Part 2 - Dev API Chapter | Part 3 - Vertex AI Chapter | | ||
|---------|----------------|-------------------| | ||
| Basic WebSocket Setup | [Chapter 3](part_2_dev_api/chapter_03) | - | | ||
| Text-to-Speech | [Chapter 4](part_2_dev_api/chapter_04) | - | | ||
| Real-time Audio Chat | [Chapter 5](part_2_dev_api/chapter_05) | [Chapter 9](part_3_vertex_api/chapter_09) | | ||
| Multimodal (Audio + Video) | [Chapter 6](part_2_dev_api/chapter_06) | [Chapter 10](part_3_vertex_api/chapter_10) | | ||
| Function Calling & Instructions | [Chapter 7](part_2_dev_api/chapter_07) | [Chapter 11](part_3_vertex_api/chapter_11) | | ||
| Production Deployment (Project Pastra)| [Chapter 8](part_2_dev_api/chapter_08) | [Chapter 12](part_3_vertex_api/chapter_12) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Feature Implementation Guide is helpful. Consider adding a column for "Part 1 - Intro Chapter" to show where basic concepts are introduced. Also, consider adding rows for other features like "Interruption Handling", "Error Handling", "Production Deployment", "Security Considerations", etc., to make the guide even more comprehensive.
...-vertex-ai/gemini_2_0/gemini-multimodal-live-api-dev-guide/part_1_intro/chapter_02/README.md
Outdated
Show resolved
Hide resolved
...ai/gemini_2_0/gemini-multimodal-live-api-dev-guide/part_1_intro/chapter_02/audio-to-audio.py
Show resolved
Hide resolved
...ertex-ai/gemini_2_0/gemini-multimodal-live-api-dev-guide/part_2_dev_api/chapter_03/README.md
Outdated
Show resolved
Hide resolved
...ertex-ai/gemini_2_0/gemini-multimodal-live-api-dev-guide/part_2_dev_api/chapter_04/README.md
Outdated
Show resolved
Hide resolved
genai-on-vertex-ai/gemini_2_0/gemini-multimodal-live-api-dev-guide/part_3_vertex_api/README.md
Outdated
Show resolved
Hide resolved
…uide/CONTRIBUTING.md Co-authored-by: code-review-assist[bot] <182814678+code-review-assist[bot]@users.noreply.github.com>
…uide/part_1_intro/chapter_02/audio-to-audio.py Co-authored-by: code-review-assist[bot] <182814678+code-review-assist[bot]@users.noreply.github.com>
…uide/part_1_intro/chapter_02/README.md Co-authored-by: code-review-assist[bot] <182814678+code-review-assist[bot]@users.noreply.github.com>
…uide/README.md Co-authored-by: code-review-assist[bot] <182814678+code-review-assist[bot]@users.noreply.github.com>
…uide/part_1_intro/README.md Co-authored-by: code-review-assist[bot] <182814678+code-review-assist[bot]@users.noreply.github.com>
…uide/part_1_intro/chapter_02/README.md Co-authored-by: code-review-assist[bot] <182814678+code-review-assist[bot]@users.noreply.github.com>
…uide/part_3_vertex_api/README.md Co-authored-by: code-review-assist[bot] <182814678+code-review-assist[bot]@users.noreply.github.com>
…uide/part_2_dev_api/chapter_04/README.md Co-authored-by: code-review-assist[bot] <182814678+code-review-assist[bot]@users.noreply.github.com>
…uide/part_2_dev_api/chapter_03/README.md Co-authored-by: code-review-assist[bot] <182814678+code-review-assist[bot]@users.noreply.github.com>
…uide/part_1_intro/chapter_02/audio-to-audio.py Co-authored-by: code-review-assist[bot] <182814678+code-review-assist[bot]@users.noreply.github.com>
Pull-Request Template
Thank you for your contribution! Please provide a brief description of your changes and ensure you've completed the checklist below.
Description
What does this PR do? Why is it necessary?
Fixes # (if applicable)
Checklist
nox -s format
to format the code.aaie_notebook_template.ipynb
if submitting a new jupyter notbook.