Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Batch encoding for TEI encoder #423

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Conversation

Vits-99
Copy link
Collaborator

@Vits-99 Vits-99 commented Sep 20, 2024

User description

Related to PR#414.

From user:

Hi everyone.

I wanted to use the Text Embeddings Inference with the encoder but I noticed two small bugs in the code. I believe that the HFEndpointEncoder was intentionally created to be used with TEI (right?)

  1. The loop for max_retries in attempts, that is inside of the function query, has no break or any system to return the result when we have a success response. I added a break, similar to the OpenAI encoder.

  2. The response from TEI is [[[array]]]. The array is inside of a list of a list. I remove one list when receiving the response. Without this it will throw a dimension error when comparing all the vectors.

These are the main bugs, but I would also take some time to purpose a future update. With TEI we can send a batch of texts

curl 127.0.0.1:8080/embed \
    -X POST \
    -d '{"inputs":["Today is a nice day", "I like you"]}' \
    -H 'Content-Type: application/json'

To save time, we could batch the different sentences to the endpoint. This would be great for longer document. If it sounds interesting I can try to help to develop it.

By the way, should I use semantic router for splitting text, or the semantic chunkers?


PR Type

enhancement, bug fix


Description

  • Implemented batch processing for the Text Embeddings Inference, allowing multiple documents to be processed in a single query with a batch size of 50.
  • Fixed the handling of the TEI response to correctly process nested lists, preventing dimension errors.
  • Added a break statement in the retry loop within the query method to exit upon a successful response, improving efficiency.
  • Enhanced error handling to provide clearer error messages when no embeddings are returned for a batch.

Changes walkthrough 📝

Relevant files
Enhancement
huggingface.py
Implement batch processing and fix response handling in TEI encoder

semantic_router/encoders/huggingface.py

  • Implemented batch processing for document embeddings with a batch size
    of 50.
  • Fixed the response handling to correctly process nested lists in the
    output.
  • Added a break statement in the retry loop to stop on successful
    response.
  • Improved error handling for batch processing.
  • +13/-8   

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    @Vits-99 Vits-99 added the feature New feature request label Sep 20, 2024
    @Vits-99 Vits-99 requested a review from jamescalam September 20, 2024 20:19
    @Vits-99 Vits-99 self-assigned this Sep 20, 2024
    @github-actions github-actions bot added enhancement Enhancement to existing features Bug fix Review effort [1-5]: 2 labels Sep 20, 2024
    Copy link

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Key issues to review

    Error Handling
    The error handling in the batch processing might suppress specific errors which could be useful for debugging. Consider logging the error before raising a new one to maintain the error context.

    List Processing
    The condition to check if the output is a list might not correctly handle nested lists as expected from the PR description. This could lead to incorrect embeddings structure.

    Copy link

    github-actions bot commented Sep 20, 2024

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Enhancement
    Make batch size configurable by adding it as a method parameter

    Instead of using a hardcoded batch size, consider making batch_size a parameter of
    the method with a default value. This will make the method more flexible and allow
    users to specify a different batch size if needed.

    semantic_router/encoders/huggingface.py [215]

    -batch_size = 50
    +def __call__(self, docs: List[str], batch_size: int = 50) -> List[List[float]]:
     
    Suggestion importance[1-10]: 8

    Why: Making the batch size configurable enhances the flexibility of the method, allowing users to adjust it based on their specific needs and constraints, which is a significant improvement.

    8
    Improve error handling by specifying exception types in the query method

    Consider adding a specific exception type or custom exception message for different
    error scenarios in the query method to improve error traceability and handling.

    semantic_router/encoders/huggingface.py [228]

    +except requests.exceptions.HTTPError as e:
    +    raise ValueError(f"HTTP error occurred: {e}") from e
    +except requests.exceptions.ConnectionError as e:
    +    raise ValueError(f"Connection error occurred: {e}") from e
     except Exception as e:
    -    raise ValueError(f"No embeddings returned for batch. Error: {e}") from e
    +    raise ValueError(f"An unexpected error occurred: {e}") from e
     
    Suggestion importance[1-10]: 6

    Why: Specifying exception types can improve error traceability and handling, making the code more maintainable and easier to debug, but the improvement is not critical unless specific exceptions are expected frequently.

    6
    Possible bug
    Add error handling for non-list outputs to prevent runtime errors

    Add error handling for the case when outputs is not a list, as currently, the code
    assumes outputs will always be a list. This could lead to unexpected errors if the
    structure of outputs changes.

    semantic_router/encoders/huggingface.py [223-226]

    -if isinstance(outputs[0], list):
    -    embeddings.extend(outputs)
    +if isinstance(outputs, list):
    +    if all(isinstance(item, list) for item in outputs):
    +        embeddings.extend(outputs)
    +    else:
    +        raise ValueError("Expected a list of lists as output.")
     else:
    -    embeddings.append(outputs)
    +    raise ValueError("Expected a list as output.")
     
    Suggestion importance[1-10]: 7

    Why: Adding error handling for unexpected output types improves the robustness of the code by preventing potential runtime errors, though it may not be crucial if the output format is well-defined.

    7
    Best practice
    Use Pythonic way to check for empty lists

    Instead of checking if outputs is empty with len(outputs) == 0, use the more
    Pythonic not outputs.

    semantic_router/encoders/huggingface.py [221-222]

    -if not outputs or len(outputs) == 0:
    +if not outputs:
         raise ValueError("No embeddings returned from the query.")
     
    Suggestion importance[1-10]: 5

    Why: While using not outputs is more Pythonic and slightly improves readability, the existing code is functionally correct, so the improvement is minor.

    5

    Copy link

    codecov bot commented Sep 20, 2024

    Codecov Report

    All modified and coverable lines are covered by tests ✅

    Project coverage is 68.37%. Comparing base (5603bac) to head (1db96e6).

    Additional details and impacted files
    @@            Coverage Diff             @@
    ##             main     #423      +/-   ##
    ==========================================
    + Coverage   68.04%   68.37%   +0.33%     
    ==========================================
      Files          46       46              
      Lines        3505     3510       +5     
    ==========================================
    + Hits         2385     2400      +15     
    + Misses       1120     1110      -10     

    ☔ View full report in Codecov by Sentry.
    📢 Have feedback on the report? Share it here.

    @@ -212,19 +212,17 @@ def __call__(self, docs: List[str]) -> List[List[float]]:
    ValueError: If no embeddings are returned for a document.
    """

    batch_size=50
    batch_size = 50

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ValueError: No embeddings returned for batch. Error: Query failed with status 413: {"error":"batch size 50 > maximum allowed batch size 32","error_type":"Validation"}

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Hi @joaomsimoes what HuggingFace TEI model were you using when you encountered this error?

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Sorry for the late answer @Siraj-Aizlewood

    I was using Alibaba-NLP/gte-large-en-v1.5

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    enhancement Enhancement to existing features feature New feature request Review effort [1-5]: 2
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    4 participants