Skip to content

Commit

Permalink
Merge pull request #249 from aurelio-labs/simonas/splitter
Browse files Browse the repository at this point in the history
fix: Hard split for max token size
  • Loading branch information
jamescalam authored Apr 17, 2024
2 parents 6cf5522 + 86bea98 commit 7f0909e
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion semantic_router/splitters/rolling_window.py
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,11 @@ def _split_documents(
logger.debug(f"Document token count: {doc_token_count} tokens")
# Check if current index is a split point based on similarity
if doc_idx + 1 in split_indices:
if current_tokens_count + doc_token_count >= self.min_split_tokens:
if (
self.min_split_tokens
<= current_tokens_count + doc_token_count
< self.max_split_tokens
):
# Include the current document before splitting
# if it doesn't exceed the max limit
current_split.append(doc)
Expand Down

0 comments on commit 7f0909e

Please sign in to comment.