Skip to content

Commit

Permalink
fix: Ignore some emtpy page_content when append to split_documents (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
listeng authored Mar 19, 2024
1 parent 4419d35 commit 696efe4
Showing 1 changed file with 4 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,12 @@ def transform(self, documents: list[Document], **kwargs) -> list[Document]:
# delete Spliter character
page_content = document_node.page_content
if page_content.startswith(".") or page_content.startswith("。"):
page_content = page_content[1:]
page_content = page_content[1:].strip()
else:
page_content = page_content
document_node.page_content = page_content
split_documents.append(document_node)
if len(page_content) > 0:
document_node.page_content = page_content
split_documents.append(document_node)
all_documents.extend(split_documents)
return all_documents

Expand Down

0 comments on commit 696efe4

Please sign in to comment.