Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: abstract layers #465

Merged
merged 33 commits into from
Nov 29, 2024
Merged

feat: abstract layers #465

merged 33 commits into from
Nov 29, 2024

Conversation

jamescalam
Copy link
Member

@jamescalam jamescalam commented Nov 23, 2024

PR Type

Enhancement, Tests, Documentation


Description

  • Introduced new HybridRouteLayer and BaseRouteLayer classes to enhance routing capabilities with hybrid and base functionalities.
  • Refactored existing RouteLayer to inherit from BaseRouteLayer, improving code structure and maintainability.
  • Added HybridLocalIndex class for managing local hybrid indexing, supporting both dense and sparse embeddings.
  • Updated documentation and tests to align with new API changes, including version updates and import path adjustments.
  • Added tornado as a new dependency and updated project version to 0.1.0.dev2.

Changes walkthrough 📝

Relevant files
Enhancement
6 files
__init__.py
Update imports and version in init file                                   

semantic_router/init.py

  • Updated import paths for HybridRouteLayer.
  • Updated version to 0.1.0.dev2.
  • +2/-3     
    __init__.py
    Add new router layer imports                                                         

    semantic_router/routers/init.py

  • Added new imports for BaseRouteLayer, LayerConfig, RouteLayer, and
    HybridRouteLayer.
  • +10/-0   
    base.py
    Implement BaseRouteLayer with route management                     

    semantic_router/routers/base.py

  • Introduced BaseRouteLayer class with extensive functionality.
  • Added methods for route management and synchronization.
  • Implemented configuration and encoding utilities.
  • +1252/-0
    hybrid.py
    Add HybridRouteLayer for hybrid routing                                   

    semantic_router/routers/hybrid.py

  • Introduced HybridRouteLayer class for hybrid routing.
  • Added support for dense and sparse embeddings.
  • Implemented route classification and threshold checking.
  • +213/-0 
    semantic.py
    Refactor RouteLayer to extend BaseRouteLayer                         

    semantic_router/routers/semantic.py

  • Refactored RouteLayer to inherit from BaseRouteLayer.
  • Adjusted initialization and configuration methods.
  • +20/-26 
    hybrid_local.py
    Introduce HybridLocalIndex for local hybrid indexing         

    semantic_router/index/hybrid_local.py

  • Added HybridLocalIndex class for local hybrid indexing.
  • Implemented methods for adding and querying embeddings.
  • +190/-0 
    Documentation
    2 files
    conf.py
    Update documentation version                                                         

    docs/source/conf.py

    • Updated project release version to 0.1.0.dev2.
    +1/-1     
    pinecone-sync-routes.ipynb
    Update Pinecone sync notebook for new API                               

    docs/indexes/pinecone-sync-routes.ipynb

  • Updated semantic-router version in installation command.
  • Modified code cells to reflect new API changes.
  • +348/-76
    Configuration changes
    1 files
    pyproject.toml
    Update project version and dependencies                                   

    pyproject.toml

  • Updated project version to 0.1.0.dev2.
  • Added tornado as a new dependency.
  • +3/-2     
    Tests
    3 files
    test_hybrid_layer.py
    Update test imports for HybridRouteLayer                                 

    tests/unit/test_hybrid_layer.py

    • Updated import path for HybridRouteLayer.
    +1/-1     
    test_router.py
    Update test imports for router layers                                       

    tests/unit/test_router.py

    • Updated import paths for LayerConfig and RouteLayer.
    +1/-1     
    test_sync.py
    Update test imports for RouteLayer                                             

    tests/unit/test_sync.py

    • Updated import path for RouteLayer.
    +1/-1     

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 5 🔵🔵🔵🔵🔵
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Code Complexity
    The BaseRouteLayer class and its methods are extremely large and complex, which could lead to difficulties in maintenance and further development. Consider refactoring to reduce complexity and improve readability.

    Code Duplication
    There is significant code duplication in methods across different classes, especially in encoding and syncing functionalities. This could be streamlined by abstracting common functionalities into utility functions or base classes.

    Error Handling
    The error handling in HybridLocalIndex might not cover all edge cases, particularly in methods that manipulate the index state. More robust error handling and validation are needed to ensure stability.

    Inheritance Issues
    The RouteLayer class has been refactored to inherit from BaseRouteLayer, but there are still remnants of old code that do not leverage the inherited properties and methods effectively. This needs cleanup to fully utilize the benefits of inheritance.

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Prevent division by zero errors during similarity calculations

    Ensure that the division by zero is handled when calculating similarities in the
    query method.

    semantic_router/index/hybrid_local.py [102-107]

    -sim_d = np.squeeze(np.dot(self.index, xq_d.T)) / (index_norm * xq_d_norm)
    -sim_s = np.squeeze(np.dot(self.sparse_index, xq_s.T)) / (sparse_norm * xq_s_norm)
    +sim_d = np.squeeze(np.dot(self.index, xq_d.T)) / (index_norm * xq_d_norm) if xq_d_norm != 0 else 0
    +sim_s = np.squeeze(np.dot(self.sparse_index, xq_s.T)) / (sparse_norm * xq_s_norm) if xq_s_norm != 0 else 0
    Suggestion importance[1-10]: 8

    Why: This suggestion is crucial as it prevents potential runtime errors due to division by zero, which can occur if the norms of the vectors are zero. This enhances the robustness of the code.

    8
    Validate the alpha parameter to ensure it is within a valid range

    Ensure that the alpha parameter is validated to be within the range [0, 1] to
    prevent invalid scaling factors.

    semantic_router/routers/hybrid.py [25]

    -alpha: float = 0.3
    +alpha: float = Field(0.3, ge=0, le=1)
    Suggestion importance[1-10]: 8

    Why: Validating the alpha parameter to ensure it falls within the range [0, 1] is crucial for preventing logical errors in scaling calculations, enhancing the robustness of the feature scaling process.

    8
    Ensure input arrays have matching dimensions to prevent runtime errors

    Validate the shape of the input arrays to ensure they match expected dimensions
    before operations in the add method.

    semantic_router/index/hybrid_local.py [37-38]

     embeds = np.array(embeddings)
     sparse_embeds = np.array(sparse_embeddings)
    +if embeds.shape[1] != sparse_embeds.shape[1]:
    +    raise ValueError("Embeddings and sparse embeddings must have the same number of features.")
    Suggestion importance[1-10]: 7

    Why: This suggestion is important as it ensures that the embeddings and sparse embeddings have matching dimensions before operations, preventing runtime errors due to dimension mismatch.

    7
    Implement error handling for asynchronous encoding to enhance robustness

    Add error handling for the asynchronous encoding methods to manage potential
    failures in encoding processes.

    semantic_router/routers/hybrid.py [107]

    -dense_vec, sparse_vec = await asyncio.gather(dense_coro, sparse_coro)
    +try:
    +    dense_vec, sparse_vec = await asyncio.gather(dense_coro, sparse_coro)
    +except Exception as e:
    +    logger.error(f"Failed to encode: {e}")
    +    raise
    Suggestion importance[1-10]: 7

    Why: Adding error handling for asynchronous encoding methods is important to manage potential failures, which improves the reliability and fault tolerance of the encoding process.

    7
    Validate input text to prevent errors during encoding

    Implement checks to ensure that text is not None or empty before proceeding with
    encoding in _encode and _async_encode methods.

    semantic_router/routers/hybrid.py [89-92]

    +if not text:
    +    raise ValueError("Input text cannot be None or empty")
     xq_d = np.array(self.encoder(text))
     xq_s = np.array(self.sparse_encoder(text))
    Suggestion importance[1-10]: 7

    Why: Implementing input validation for the text parameter in encoding methods is essential to avoid errors and ensure that the encoding process is only attempted with valid input, enhancing error handling and robustness.

    7
    Ensure safe type conversions in encoding processes

    Replace the direct use of np.array for type conversion with more explicit checks and
    conversions to handle potential type mismatches or errors.

    semantic_router/routers/hybrid.py [89-92]

    -xq_d = np.array(self.encoder(text))
    -xq_s = np.array(self.sparse_encoder(text))
    +xq_d = np.array(self.encoder(text), dtype=float)
    +xq_s = np.array(self.sparse_encoder(text), dtype=float)
    Suggestion importance[1-10]: 6

    Why: Ensuring explicit type conversions using dtype=float when creating numpy arrays can prevent type mismatch errors and ensure data consistency, which is beneficial for the stability of the encoding operations.

    6
    General
    Optimize array concatenation for better performance with large datasets

    Replace the direct concatenation of numpy arrays with a more efficient method to
    handle large data sizes in the add method.

    semantic_router/index/hybrid_local.py [51-52]

    -self.index = np.concatenate([self.index, embeds])
    -self.sparse_index = np.concatenate([self.sparse_index, sparse_embeds])
    +self.index = np.vstack([self.index, embeds])
    +self.sparse_index = np.vstack([self.sparse_index, sparse_embeds])
    Suggestion importance[1-10]: 5

    Why: The suggestion to use np.vstack instead of np.concatenate for performance optimization is valid, but the impact is moderate as it mainly affects performance rather than functionality.

    5
    Improve error handling clarity for unsupported features

    Implement error handling for unsupported operations like route filtering in the
    query method to provide clearer error messages.

    semantic_router/index/hybrid_local.py [91-92]

     if route_filter:
    -    raise ValueError("Route filter is not supported for HybridLocalIndex.")
    +    raise NotImplementedError("Route filtering is currently not supported for HybridLocalIndex.")
    Suggestion importance[1-10]: 3

    Why: Changing the error message to 'NotImplementedError' provides clearer communication about the unsupported feature, but this change has a minor impact on the overall code functionality.

    3

    Copy link

    codecov bot commented Nov 29, 2024

    Codecov Report

    Attention: Patch coverage is 75.57118% with 139 lines in your changes missing coverage. Please review.

    Project coverage is 73.61%. Comparing base (a5103d6) to head (ff2c3ca).
    Report is 35 commits behind head on main.

    Files with missing lines Patch % Lines
    semantic_router/index/hybrid_local.py 72.16% 27 Missing ⚠️
    semantic_router/routers/hybrid.py 79.09% 23 Missing ⚠️
    semantic_router/index/pinecone.py 37.14% 22 Missing ⚠️
    semantic_router/routers/base.py 83.33% 17 Missing ⚠️
    semantic_router/encoders/aurelio.py 48.38% 16 Missing ⚠️
    semantic_router/index/local.py 31.25% 11 Missing ⚠️
    semantic_router/schema.py 77.14% 8 Missing ⚠️
    semantic_router/index/postgres.py 54.54% 5 Missing ⚠️
    semantic_router/routers/semantic.py 89.18% 4 Missing ⚠️
    semantic_router/encoders/base.py 85.00% 3 Missing ⚠️
    ... and 2 more
    Additional details and impacted files
    @@            Coverage Diff             @@
    ##             main     #465      +/-   ##
    ==========================================
    + Coverage   72.29%   73.61%   +1.32%     
    ==========================================
      Files          39       43       +4     
      Lines        3292     3548     +256     
    ==========================================
    + Hits         2380     2612     +232     
    - Misses        912      936      +24     

    ☔ View full report in Codecov by Sentry.
    📢 Have feedback on the report? Share it here.

    @jamescalam jamescalam merged commit 0228eaf into main Nov 29, 2024
    8 checks passed
    @jamescalam jamescalam deleted the james/abstract-layers branch November 29, 2024 15:52
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    Create aurelio sparse encoder Work on new Route Layer abstractions
    1 participant