Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement wait_for_complete() for Speaker class #471

Merged

Conversation

davidvonthenen
Copy link
Contributor

@davidvonthenen davidvonthenen commented Sep 29, 2024

Proposed changes

Addresses: #469

Please see issue for details...

Types of changes

What types of changes does your code introduce to the community Python SDK?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update or tests (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING doc
  • I have lint'ed all of my code using repo standards
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

NA

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced PLAYBACK_DELTA constant for enhanced audio playback management.
    • Added wait_for_complete method to the Speaker and WebSocket client classes for improved control over audio playback completion.
  • Improvements

    • Updated configuration options for speaker playback timing.
    • Enhanced the handling of asynchronous operations in text-to-speech examples for a more responsive experience.

Copy link
Contributor

coderabbitai bot commented Sep 29, 2024

Walkthrough

The pull request introduces several changes across the Deepgram audio and client modules, focusing on the addition of a new constant, PLAYBACK_DELTA, and enhancements to audio playback functionality. Key modifications include updates to the Speaker class to track audio playback completion, the introduction of new methods in client classes to handle playback synchronization, and adjustments in example scripts to utilize these new features.

Changes

File Path Change Summary
deepgram/audio/__init__.py Added PLAYBACK_DELTA as OUTPUT_PLAYBACK_DELTA to exported entities.
deepgram/audio/speaker/__init__.py Added PLAYBACK_DELTA to import statements from .constants.
deepgram/audio/speaker/constants.py Updated header comment to "Constants for speaker" and added new constant PLAYBACK_DELTA = 2000.
deepgram/audio/speaker/speaker.py Introduced wait_for_complete method for tracking audio playback; updated constructor to initialize new attributes.
deepgram/clients/speak/v1/websocket/async_client.py Added PLAYBACK_DELTA import and updated constructor to include speaker_playback_delta_in_ms; introduced wait_for_complete method.
deepgram/clients/speak/v1/websocket/client.py Added PLAYBACK_DELTA import; updated constructor to handle speaker_playback_delta_in_ms and included wait_for_complete method.
examples/text-to-speech/websocket/async_complete/main.py Modified configuration settings and replaced fixed sleep with await dg_connection.wait_for_complete().
examples/text-to-speech/websocket/complete/main.py Replaced fixed sleep with dg_connection.wait_for_complete() for dynamic control flow.

Possibly related PRs

  • Reintroduce TTS WS #434: This PR involves the addition of WebSocket-related imports and functionality, which may interact with the newly exported PLAYBACK_DELTA constant in the main PR, enhancing the overall WebSocket capabilities of the SDK.

Suggested reviewers

  • lukeocodes
  • jpvajda
  • SandraRodgers
  • naomi-lgbt

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between feb2680 and c6c22e1.

📒 Files selected for processing (1)
  • deepgram/clients/speak/v1/websocket/client.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepgram/clients/speak/v1/websocket/client.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (7)
deepgram/audio/speaker/__init__.py (1)

7-7: LGTM. Consider adding documentation for PLAYBACK_DELTA.

The addition of PLAYBACK_DELTA to the imports is correct and aligns with the PR objectives. This new constant is now available at the package level, which is good for accessibility.

Consider adding a brief comment or docstring in this file or in the constants.py file to explain the purpose and usage of PLAYBACK_DELTA. This would improve code readability and make it easier for other developers to understand its role in the audio playback functionality.

deepgram/audio/speaker/constants.py (1)

14-15: Please clarify the purpose and unit of PLAYBACK_DELTA.

The new constant PLAYBACK_DELTA has been added, but its purpose and unit are not immediately clear. To improve code readability and maintainability:

  1. Add a comment explaining the purpose of this constant.
  2. Specify the unit (e.g., milliseconds, samples) in the comment.
  3. Consider using a more descriptive name if applicable, e.g., PLAYBACK_DELAY_MS if it represents a delay in milliseconds.

Example:

# Delay in milliseconds to ensure playback completion
PLAYBACK_DELAY_MS = 2000
examples/text-to-speech/websocket/complete/main.py (1)

101-102: Consider adding an explanatory comment

The addition of wait_for_complete() significantly improves the script. To further enhance its educational value for users new to the Deepgram SDK, consider adding a brief comment explaining the purpose of this method. For example:

# Wait for the text-to-speech processing to complete before proceeding
dg_connection.wait_for_complete()

This would help users understand the importance of this step in the text-to-speech process.

examples/text-to-speech/websocket/async_complete/main.py (1)

Line range hint 1-138: Well-structured script with robust error handling.

The overall structure of the script is well-organized and follows good practices for asynchronous programming. The changes integrate seamlessly with the existing code. Here are some positive observations:

  1. Proper use of asyncio for asynchronous operations.
  2. Signal handlers set up for graceful shutdown.
  3. Comprehensive error handling with try-except blocks.
  4. Clear separation of concerns between different parts of the script.

One minor suggestion for improvement:

Consider adding more specific exception handling for network-related errors, as they are likely in a websocket-based application. For example:

import aiohttp

try:
    # ... existing code ...
except ValueError as e:
    print(f"Invalid value encountered: {e}")
except aiohttp.ClientError as e:
    print(f"Network error occurred: {e}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

This would provide more detailed error information for debugging network-related issues.

deepgram/audio/speaker/speaker.py (2)

210-211: Correct typo in the docstring of wait_for_complete method

The docstring contains a typo: "speak" should be "speaker".

Apply this diff to correct the typo:

     """
-    This method will block until the speak is done playing sound.
+    This method will block until the speaker is done playing sound.
     """

235-236: Clarify logging messages in wait_for_complete method

The logging message "LastPlay delta is less than threshold" may be unclear to someone reviewing logs. Consider rephrasing it for better clarity.

Update the logging messages for clarity:

     if diff_in_ms < delta_in_ms:
-        self._logger.debug("LastPlay delta is less than threshold")
+        self._logger.debug("Time since last audio data (%f ms) is less than threshold (%f ms). Continuing to wait.", diff_in_ms, delta_in_ms)
         continue

...

     self._logger.debug("LastPlay delta is greater than threshold. Exit wait!")

Similarly, update the exit message:

     self._logger.debug("LastPlay delta is greater than threshold. Exit wait!")
+    self._logger.debug("Time since last audio data (%f ms) exceeds threshold (%f ms). Exiting wait.", diff_in_ms, delta_in_ms)
deepgram/clients/speak/v1/websocket/client.py (1)

601-602: Correct grammatical error in docstring

The docstring has a grammatical error. It should be rephrased for clarity.

Apply this diff to correct the docstring:

            """
-            This method will block until the speak is done playing sound.
+            This method will block until the speech is done playing sound.
            """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b233b60 and feb2680.

📒 Files selected for processing (8)
  • deepgram/audio/init.py (1 hunks)
  • deepgram/audio/speaker/init.py (1 hunks)
  • deepgram/audio/speaker/constants.py (1 hunks)
  • deepgram/audio/speaker/speaker.py (6 hunks)
  • deepgram/clients/speak/v1/websocket/async_client.py (4 hunks)
  • deepgram/clients/speak/v1/websocket/client.py (4 hunks)
  • examples/text-to-speech/websocket/async_complete/main.py (2 hunks)
  • examples/text-to-speech/websocket/complete/main.py (1 hunks)
🔇 Additional comments (14)
deepgram/audio/speaker/constants.py (1)

7-7: LGTM: Comment update reflects the new focus on speaker functionality.

The change from "Constants for microphone" to "Constants for speaker" accurately represents the shift in focus and aligns with the PR objective of implementing speaker-related functionality.

deepgram/audio/__init__.py (1)

21-21: LGTM! Verify usage in Speaker class.

The addition of PLAYBACK_DELTA as OUTPUT_PLAYBACK_DELTA to the import list is consistent with the existing naming convention and aligns with the PR objective of implementing wait_for_complete() for the Speaker class.

To ensure this constant is being utilized as intended, let's verify its usage in the Speaker class implementation:

✅ Verification successful

Action Required: Corrected shell scripts to verify usage of OUTPUT_PLAYBACK_DELTA


Action Required: Locate Speaker class and verify OUTPUT_PLAYBACK_DELTA usage


Verified: OUTPUT_PLAYBACK_DELTA correctly added to the public interface and not used internally.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the usage of PLAYBACK_DELTA in the Speaker class

# Test: Search for PLAYBACK_DELTA in the speaker.py file
rg --type python 'PLAYBACK_DELTA' deepgram/audio/speaker.py

# Test: Search for OUTPUT_PLAYBACK_DELTA in other files that might use it
rg --type python 'OUTPUT_PLAYBACK_DELTA' -g '!deepgram/audio/__init__.py'

Length of output: 195


Script:

#!/bin/bash
# Description: Check the usage of PLAYBACK_DELTA and OUTPUT_PLAYBACK_DELTA in the Speaker class

# Test: Search for PLAYBACK_DELTA in deepgram/audio/speaker.py
rg -t py 'PLAYBACK_DELTA' deepgram/audio/speaker.py

# Test: Search for OUTPUT_PLAYBACK_DELTA in other Python files excluding __init__.py
rg -t py 'OUTPUT_PLAYBACK_DELTA' --glob '!deepgram/audio/__init__.py'

Length of output: 187


Script:

#!/bin/bash
# Description: Locate the Speaker class definition in the codebase

# Search for the Speaker class definition across all Python files
rg -t py 'class Speaker' --context 0

# Description: Check the usage of OUTPUT_PLAYBACK_DELTA in the Speaker class definition

# Replace <path_to_speaker_file> with the actual path found from the previous command
# Example placeholder path used here
OUTPUT_PLAYBACK_DELTA_PATH=$(rg -t py 'class Speaker' --files-with-matches)

if [ -n "$OUTPUT_PLAYBACK_DELTA_PATH" ]; then
    rg -t py 'OUTPUT_PLAYBACK_DELTA' "$OUTPUT_PLAYBACK_DELTA_PATH"
else
    echo "Speaker class definition not found."
fi

Length of output: 361

examples/text-to-speech/websocket/complete/main.py (3)

96-96: Improved code readability

The addition of a blank line after sending text to Deepgram enhances code readability by clearly separating different logical blocks of the script.


101-102: Improved reliability with dynamic wait

The replacement of a fixed time.sleep(5) with dg_connection.wait_for_complete() is a significant improvement:

  1. It eliminates the risk of premature script termination if text-to-speech processing takes longer than 5 seconds.
  2. It optimizes execution time by avoiding unnecessary waiting if processing completes faster than 5 seconds.

This change effectively demonstrates the practical application of the new wait_for_complete() feature in the Deepgram SDK.


Line range hint 1-124: Summary of changes

The modifications to this example script effectively demonstrate the new wait_for_complete() feature of the Deepgram SDK. These changes provide a more robust and efficient approach to handling text-to-speech completion, making the example more valuable for users learning to work with the SDK. The script now better reflects best practices for using Deepgram's text-to-speech functionality.

examples/text-to-speech/websocket/async_complete/main.py (3)

105-105: Improved handling of text-to-speech completion.

The changes in this section significantly improve the code:

  1. The added blank line (105) enhances readability by clearly separating the text sending and completion waiting steps.

  2. Replacing the previous sleep call with await dg_connection.wait_for_complete() (110) is an excellent improvement:

    • It eliminates the hard-coded wait time, making the code more flexible and efficient.
    • It aligns perfectly with the PR objective of implementing the wait_for_complete() method.
    • It likely provides a more accurate way to determine when the text-to-speech operation is complete.

These changes make the code more robust and easier to maintain.

Also applies to: 110-110


Line range hint 1-138: Summary: Excellent implementation of wait_for_complete() with well-integrated changes.

This PR successfully implements the wait_for_complete() method for the Speaker class, as intended. The changes in this example script effectively demonstrate the usage of the new feature and integrate well with the existing code structure. Key improvements include:

  1. Proper configuration setup for speaker playback.
  2. Replacement of the hard-coded sleep with the new wait_for_complete() method.
  3. Maintained overall script structure with good error handling and asynchronous programming practices.

The changes align perfectly with the PR objectives and provide a clear example for users of the Deepgram Python SDK. The script now offers a more efficient and accurate way to handle text-to-speech operations.


35-39: Configuration changes impact behavior and logging verbosity.

The changes to the DeepgramClientOptions configuration have the following effects:

  1. Commenting out "auto_flush_speak_delta": "500" disables automatic flushing. Ensure this aligns with the intended behavior, as manual flushing is now required (which is done on line 107).

  2. Setting "speaker_playback": "true" enables speaker playback, which is consistent with the PR objective of implementing wait_for_complete().

  3. Changing verbose to verboselogs.SPAM significantly increases logging verbosity. While this is useful for debugging, consider using a less verbose level in production to avoid performance impacts.

To ensure these changes are consistent across the codebase, run the following script:

✅ Verification successful

Configuration changes are consistently applied and verified across the codebase.

The adjustments to the DeepgramClientOptions in examples/text-to-speech/websocket/async_complete/main.py are consistent and do not introduce any inconsistencies in other parts of the codebase. The commented-out configurations in other files indicate deprecated or inactive settings, ensuring that only the intended examples are affected by the changes.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for other occurrences of DeepgramClientOptions configuration
# Test: Search for DeepgramClientOptions usage. Expect: Consistent configuration across examples.
rg --type python -A 10 'DeepgramClientOptions\(' examples/

Length of output: 91


Script:

#!/bin/bash
# Description: Re-run the DeepgramClientOptions search with the correct file type
# Test: Search for DeepgramClientOptions usage. Expect: Consistent configuration across examples.
rg --type py -A 10 'DeepgramClientOptions\(' examples/

Length of output: 25590

deepgram/audio/speaker/speaker.py (2)

367-368: Confirm minimal lock usage in _play method

Updating _last_datagram within the _play method is correctly protected by _lock_wait. Ensure that the lock is held only for the minimal necessary duration to prevent any performance impact.


208-243: Ensure thread safety in wait_for_complete method

The wait_for_complete method relies on proper synchronization using _lock_wait. Please verify that all accesses to _last_datagram are adequately protected and that there are no potential race conditions.

Run the following script to check for any unprotected accesses to _last_datagram:

✅ Verification successful

Thread safety in wait_for_complete method is confirmed.

All accesses to _last_datagram within the wait_for_complete method are properly protected using self._lock_wait, ensuring thread safety and preventing race conditions.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all accesses to _last_datagram are within a lock.

# Search for all occurrences of _last_datagram in the codebase
rg --type py "_last_datagram"

# Search for occurrences where _last_datagram is accessed outside of a lock
# Assuming that accesses within 'with self._lock_wait:' blocks are safe
rg --type py "_last_datagram" --context 2 | rg -B 2 -v "with self._lock_wait"

Length of output: 16424

deepgram/clients/speak/v1/websocket/client.py (2)

30-30: 🛠️ Refactor suggestion

Consider using absolute imports for clarity

Using deep relative imports like from .....audio.speaker can make the code harder to read and maintain. Consider using absolute imports to improve clarity and prevent potential import errors.

⛔ Skipped due to learnings
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#426
File: deepgram/clients/speak/v1/websocket/__init__.py:11-11
Timestamp: 2024-07-01T19:14:53.172Z
Learning: User dvonthenen prefers to retain unused imports in the codebase to maintain backward compatibility.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#426
File: deepgram/clients/speak/v1/websocket/__init__.py:14-14
Timestamp: 2024-07-01T19:14:41.918Z
Learning: User dvonthenen prefers to retain unused imports in the codebase to maintain backward compatibility.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#426
File: deepgram/clients/speak/v1/websocket/__init__.py:10-10
Timestamp: 2024-07-01T19:14:59.869Z
Learning: User dvonthenen prefers to retain unused imports in the codebase to maintain backward compatibility, even if they are flagged as unused.

114-114: Pass the correct type for last_play_delta_in_ms to Speaker

Ensure that playback_delta_in_ms passed to Speaker is of the correct numeric type to prevent potential runtime errors during playback.

Run the following script to check the type of playback_delta_in_ms:

Also applies to: 122-122

deepgram/clients/speak/v1/websocket/async_client.py (2)

30-30: Importing PLAYBACK_DELTA is appropriate.

The addition of PLAYBACK_DELTA to the import statement ensures that the default playback delta is available for use.


111-111: Passing last_play_delta_in_ms to Speaker instance is appropriate.

Including last_play_delta_in_ms=playback_delta_in_ms when initializing the Speaker ensures that the playback delay is correctly configured according to the provided or default settings.

Also applies to: 119-119

deepgram/audio/speaker/constants.py Show resolved Hide resolved
deepgram/audio/speaker/speaker.py Show resolved Hide resolved
deepgram/clients/speak/v1/websocket/client.py Outdated Show resolved Hide resolved
deepgram/clients/speak/v1/websocket/client.py Show resolved Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@davidvonthenen davidvonthenen merged commit 84fc18b into deepgram:main Sep 30, 2024
5 checks passed
@davidvonthenen davidvonthenen deleted the speaker-wait-for-complete branch September 30, 2024 16:03
@coderabbitai coderabbitai bot mentioned this pull request Oct 18, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants