Skip to content

How to check if a document exists before inserting it into the index? #15878

Answered by dosubot bot
johnhu33 asked this question in Q&A
Discussion options

You must be logged in to vote

To check if a document exists before inserting it into the index, you can use the _handle_duplicates or _handle_upserts methods from the IngestionPipeline class. These methods check for existing documents based on their hashes or IDs.

Here is an example of how you can use _handle_duplicates to check for document existence:

def check_document_exists(pipeline: IngestionPipeline, nodes: List[BaseNode]) -> List[BaseNode]:
    """Check if documents exist in the docstore before inserting.

    Args:
        pipeline: The ingestion pipeline instance.
        nodes: The nodes (documents) to check.

    Returns:
        List of nodes that do not exist in the docstore.
    """
    if pipeline.docs…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by johnhu33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant