Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: add fast sync check #460

Merged
merged 60 commits into from
Nov 16, 2024
Merged

feat!: add fast sync check #460

merged 60 commits into from
Nov 16, 2024

Conversation

jamescalam
Copy link
Member

@jamescalam jamescalam commented Nov 8, 2024

PR Type

enhancement


Description

  • Introduced new methods _read_hash and _write_config in BaseIndex and PineconeIndex to handle configuration parameters.
  • Enhanced synchronization logic in semantic_router/layer.py by adding hash comparison to check if local and remote instances are synchronized.
  • Added ConfigParameter class in semantic_router/schema.py to manage configuration parameters and convert them to Pinecone format.
  • Implemented validation to prevent the use of reserved namespace 'sr_config' in PineconeIndex.

Changes walkthrough 📝

Relevant files
Enhancement
base.py
Add methods for reading and writing index configuration   

semantic_router/index/base.py

  • Added _read_hash method to read the hash of the index.
  • Added _write_config method to write a config parameter to the index.
  • +18/-0   
    pinecone.py
    Implement configuration methods and namespace validation for Pinecone
    index

    semantic_router/index/pinecone.py

  • Added _read_hash and _write_config methods for Pinecone index.
  • Modified PineconeRecord to have a default function schema.
  • Added validation for reserved namespace.
  • +31/-2   
    layer.py
    Enhance synchronization logic with hash comparison             

    semantic_router/layer.py

  • Added get_hash method to generate a hash for the layer.
  • Updated is_synced method to use hash comparison for synchronization
    check.
  • +29/-4   
    schema.py
    Add ConfigParameter class for configuration management     

    semantic_router/schema.py

  • Introduced ConfigParameter class for configuration management.
  • Added to_pinecone method to convert config to Pinecone format.
  • +19/-1   

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Copy link

    github-actions bot commented Nov 8, 2024

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Possible Bug
    The method _write_config in PineconeIndex uses a to_pinecone method which seems to hardcode the vector values. This could lead to incorrect data being written to the index.

    Redundant Logic
    The method is_synced in RouteLayer contains commented-out code and a TODO comment indicating uncertainty about the necessity of existing logic. This needs clarification or cleanup.

    Copy link

    github-actions bot commented Nov 8, 2024

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Best practice
    Add exception handling for data fetching and upserting operations

    Handle potential exceptions when fetching or upserting data in _read_hash and
    _write_config methods to improve robustness and error handling.

    semantic_router/index/pinecone.py [681-684]

    -hash_record = self.index.fetch(
    -    ids=[f"sr_hash#{self.namespace}"],
    -    namespace="sr_config",
    -)
    +try:
    +    hash_record = self.index.fetch(
    +        ids=[f"sr_hash#{self.namespace}"],
    +        namespace="sr_config",
    +    )
    +except Exception as e:
    +    logger.error(f"Failed to fetch hash record: {e}")
    +    raise
    Suggestion importance[1-10]: 8

    Why: Adding exception handling for data operations significantly improves the robustness and reliability of the system by ensuring that errors are properly managed and logged.

    8
    Validate config parameters before upserting to ensure data integrity

    Validate the config parameter in _write_config method to ensure it contains all
    necessary fields before attempting to upsert to the index.

    semantic_router/index/pinecone.py [701-704]

    +if not all([hasattr(config, attr) for attr in ['field', 'value', 'namespace']]):
    +    raise ValueError("Config parameter is missing required fields")
     self.index.upsert(
         vectors=[config.to_pinecone(dimensions=self.dimensions)],
         namespace="sr_config",
     )
    Suggestion importance[1-10]: 7

    Why: Validating the 'config' parameter before upserting is essential to ensure that all necessary fields are present, which prevents data integrity issues in the system.

    7
    Enhancement
    Replace hardcoded namespace with a configurable namespace attribute

    Ensure that the namespace parameter in _read_hash and _write_config methods is not
    hardcoded to "sr_config" to maintain flexibility and avoid potential conflicts with
    other namespaces.

    semantic_router/index/pinecone.py [682]

    -namespace="sr_config",
    +namespace=self.config_namespace,
    Suggestion importance[1-10]: 7

    Why: Using a configurable namespace attribute instead of a hardcoded value enhances flexibility and reduces the risk of namespace conflicts, which is crucial for a scalable system.

    7
    Performance
    Optimize data access in _read_hash by directly using metadata

    Optimize the _read_hash method by directly accessing the required metadata fields
    instead of fetching the entire vector data, which can be more efficient and clear.

    semantic_router/index/pinecone.py [685-691]

    -if hash_record["vectors"]:
    +if hash_record["metadata"]:
         return ConfigParameter(
             field="sr_hash",
    -        value=hash_record["vectors"]["sr_hash"]["metadata"]["value"],
    -        created_at=hash_record["vectors"]["sr_hash"]["metadata"]["created_at"],
    +        value=hash_record["metadata"]["value"],
    +        created_at=hash_record["metadata"]["created_at"],
             namespace=self.namespace,
         )
    Suggestion importance[1-10]: 5

    Why: The suggestion to optimize data access by directly using metadata could improve performance, but the existing code does not show that 'metadata' is directly accessible without 'vectors', which may limit the feasibility of this suggestion.

    5

    Copy link

    codecov bot commented Nov 8, 2024

    Codecov Report

    Attention: Patch coverage is 78.37838% with 80 lines in your changes missing coverage. Please review.

    Project coverage is 72.09%. Comparing base (5603bac) to head (d3c2c10).
    Report is 61 commits behind head on main.

    Files with missing lines Patch % Lines
    semantic_router/layer.py 76.15% 36 Missing ⚠️
    semantic_router/index/base.py 56.14% 25 Missing ⚠️
    semantic_router/schema.py 93.45% 7 Missing ⚠️
    semantic_router/index/pinecone.py 76.92% 6 Missing ⚠️
    semantic_router/index/postgres.py 0.00% 4 Missing ⚠️
    semantic_router/index/local.py 88.88% 1 Missing ⚠️
    semantic_router/index/qdrant.py 93.33% 1 Missing ⚠️
    Additional details and impacted files
    @@            Coverage Diff             @@
    ##             main     #460      +/-   ##
    ==========================================
    + Coverage   68.04%   72.09%   +4.05%     
    ==========================================
      Files          46       39       -7     
      Lines        3505     3290     -215     
    ==========================================
    - Hits         2385     2372      -13     
    + Misses       1120      918     -202     

    ☔ View full report in Codecov by Sentry.
    📢 Have feedback on the report? Share it here.

    @jamescalam jamescalam changed the title feat: add fast sync check feat!: add fast sync check Nov 14, 2024
    @jamescalam jamescalam merged commit 342d343 into main Nov 16, 2024
    8 checks passed
    @jamescalam jamescalam deleted the james/pinecone-hash-sync branch November 16, 2024 16:28
    @jamescalam jamescalam linked an issue Nov 16, 2024 that may be closed by this pull request
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    Add is_synced method
    1 participant