Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: async sync and pinecone methods #487

Merged
merged 7 commits into from
Dec 23, 2024
Merged

feat: async sync and pinecone methods #487

merged 7 commits into from
Dec 23, 2024

Conversation

jamescalam
Copy link
Member

@jamescalam jamescalam commented Dec 13, 2024

PR Type

Enhancement, Tests


Description

  • Introduced asynchronous methods across BaseIndex, PineconeIndex, and SemanticRouter for operations like adding, deleting, syncing, and configuration management.
  • Enhanced the BaseIndex class with async methods for locking and unlocking mechanisms, ensuring better concurrency handling.
  • Implemented async-specific methods in PineconeIndex for batch upserts, deletions, and route ID retrieval.
  • Added async_sync and related methods in SemanticRouter to enable asynchronous synchronization of routes.
  • Developed comprehensive unit tests in TestAsyncSemanticRouter to validate async functionality, including initialization, synchronization, and locking mechanisms.
  • Improved configurability with constants like RETRY_WAIT_TIME for retry intervals.

Changes walkthrough 📝

Relevant files
Enhancement
base.py
Add asynchronous methods for index operations in BaseIndex.

semantic_router/index/base.py

  • Introduced asynchronous methods for adding, deleting, and syncing data
    in the BaseIndex class.
  • Added async implementations for configuration reading and writing.
  • Introduced async locking mechanisms for index operations.
  • Added constants like RETRY_WAIT_TIME for better configurability.
  • +149/-21
    pinecone.py
    Implement asynchronous operations for Pinecone index management.

    semantic_router/index/pinecone.py

  • Added asynchronous methods for batch upserts, deletions, and route ID
    retrieval.
  • Implemented async methods for Pinecone-specific operations like
    _async_upsert and _async_delete.
  • Enhanced the aadd method to handle asynchronous batch operations.
  • +130/-1 
    base.py
    Add asynchronous synchronization methods in SemanticRouter.

    semantic_router/routers/base.py

  • Introduced async_sync method for asynchronous synchronization of
    routes.
  • Added async methods for executing sync strategies and checking
    synchronization status.
  • Implemented async methods for hash writing and utterance difference
    retrieval.
  • +141/-0 
    Tests
    test_sync.py
    Add unit tests for asynchronous synchronization and locking.

    tests/unit/test_sync.py

  • Added a new test class TestAsyncSemanticRouter for testing
    asynchronous methods.
  • Included tests for async initialization, synchronization, and locking
    mechanisms.
  • Added parameterized tests for various async sync scenarios.
  • +338/-0 

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Code Smell
    The aadd method in BaseIndex logs a warning and falls back to the synchronous add method. This fallback might lead to unexpected behavior in asynchronous contexts and should be reviewed for potential improvements.

    Possible Bug
    The _ais_locked method in BaseIndex raises a ValueError for invalid lock values. Consider handling unexpected values more gracefully or providing additional context in the error message.

    Performance Issue
    The _async_batch_upsert method in PineconeIndex directly calls self.index.upsert. If the upsert operation is not optimized for large batches, this could lead to performance bottlenecks. Consider adding batch size validation or logging for large operations.

    Code Smell
    The async_sync method in SemanticRouter has a complex flow with multiple asynchronous calls and conditionals. Consider refactoring to improve readability and maintainability.

    Test Coverage
    The TestAsyncSemanticRouter class includes comprehensive tests for async methods, but the test_sync_lock_prevents_concurrent_sync test could benefit from additional assertions to verify lock state explicitly.

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    General
    Add error handling to ensure proper resource cleanup during asynchronous operations

    Add error handling to the async_sync method to ensure proper cleanup and unlocking
    of the index in case of exceptions during the synchronization process.

    semantic_router/routers/base.py [618-631]

    -_ = await self.index.alock(value=True, wait=wait)
    -# first creating a diff
    -local_utterances = self.to_config().to_utterances()
    -remote_utterances = await self.index.aget_utterances()
    -diff = UtteranceDiff.from_utterances(
    -    local_utterances=local_utterances,
    -    remote_utterances=remote_utterances,
    -)
    -# generate sync strategy
    -sync_strategy = diff.get_sync_strategy(sync_mode=sync_mode)
    -# and execute
    -await self._async_execute_sync_strategy(sync_strategy)
    -# unlock index after sync
    -_ = await self.index.alock(value=False)
    +try:
    +    _ = await self.index.alock(value=True, wait=wait)
    +    # first creating a diff
    +    local_utterances = self.to_config().to_utterances()
    +    remote_utterances = await self.index.aget_utterances()
    +    diff = UtteranceDiff.from_utterances(
    +        local_utterances=local_utterances,
    +        remote_utterances=remote_utterances,
    +    )
    +    # generate sync strategy
    +    sync_strategy = diff.get_sync_strategy(sync_mode=sync_mode)
    +    # and execute
    +    await self._async_execute_sync_strategy(sync_strategy)
    +finally:
    +    # unlock index after sync
    +    _ = await self.index.alock(value=False)
    Suggestion importance[1-10]: 10

    Why: The suggestion significantly improves the robustness of the async_sync method by adding a try-finally block to ensure the index is unlocked even if an exception occurs during synchronization. This prevents resource locking issues and ensures proper cleanup.

    10
    Possible issue
    Avoid calling synchronous methods within asynchronous methods to prevent blocking the event loop

    Ensure that the aadd method does not call the synchronous add method directly, as
    this can block the event loop and defeat the purpose of asynchronous execution.

    semantic_router/index/base.py [59-64]

    -return self.add(
    -    embeddings=embeddings,
    -    routes=routes,
    -    utterances=utterances,
    -    function_schemas=function_schemas,
    -    metadata_list=metadata_list,
    -)
    +raise NotImplementedError("Async method not implemented.")
    Suggestion importance[1-10]: 9

    Why: The suggestion addresses a critical issue where calling a synchronous method (add) within an asynchronous method (aadd) can block the event loop, defeating the purpose of asynchronous execution. This change ensures non-blocking behavior and aligns with the purpose of async methods.

    9
    Verify the existence of asynchronous methods before using them to prevent runtime errors

    Ensure that the await self.index.upsert call in _async_batch_upsert is valid and
    supported by the self.index object, as it may not have an asynchronous upsert
    method.

    semantic_router/index/pinecone.py [246]

    -await self.index.upsert(vectors=batch, namespace=self.namespace)
    +raise NotImplementedError("Async upsert method not implemented.")
    Suggestion importance[1-10]: 8

    Why: The suggestion highlights a valid concern about ensuring that the self.index object supports the asynchronous upsert method before calling it. This prevents potential runtime errors and ensures robustness in the code.

    8

    Copy link

    codecov bot commented Dec 23, 2024

    Codecov Report

    Attention: Patch coverage is 74.28571% with 45 lines in your changes missing coverage. Please review.

    Project coverage is 74.84%. Comparing base (462b300) to head (9915053).
    Report is 10 commits behind head on main.

    Files with missing lines Patch % Lines
    semantic_router/index/pinecone.py 55.10% 22 Missing ⚠️
    semantic_router/index/base.py 76.47% 12 Missing ⚠️
    semantic_router/routers/base.py 85.33% 11 Missing ⚠️
    Additional details and impacted files
    @@            Coverage Diff             @@
    ##             main     #487      +/-   ##
    ==========================================
    + Coverage   73.74%   74.84%   +1.09%     
    ==========================================
      Files          43       43              
      Lines        3596     3753     +157     
    ==========================================
    + Hits         2652     2809     +157     
      Misses        944      944              

    ☔ View full report in Codecov by Sentry.
    📢 Have feedback on the report? Share it here.

    @jamescalam jamescalam merged commit 69b4932 into main Dec 23, 2024
    9 checks passed
    @jamescalam jamescalam deleted the james/async-sync branch December 23, 2024 18:32
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant