Skip to content

Latest commit

 

History

History
247 lines (179 loc) · 12.8 KB

CHANGELOG.md

File metadata and controls

247 lines (179 loc) · 12.8 KB

@neo4j/neo4j-graphrag-python

Next

Added

  • Support for conversations with message history, including a new message_history parameter for LLM interactions.
  • Ability to include system instructions and override them for specific invocations.
  • Summarization of chat history to enhance query embedding and context handling.

Changed

  • Updated LLM implementations to handle message history consistently across providers.
  • The id_prefix parameter in the LexicalGraphConfig is deprecated.

Fixed

  • IDs for the Document and Chunk nodes in the lexical graph are now randomly generated and unique across multiple runs, fixing issues in the lexical graph where relationships were created between chunks that were created by different pipeline runs.

1.3.0

Added

  • Integrated json-repair package to handle and repair invalid JSON generated by LLMs.
  • Introduced InvalidJSONError exception for handling cases where JSON repair fails.
  • Ability to create a Pipeline or SimpleKGPipeline from a config file. See the example.
  • Added OllamaLLM and OllamaEmbeddings classes to make Ollama support more explicit. Implementations using the OpenAILLM and OpenAIEmbeddings classes will still work.

Changed

  • Updated LLM prompt for Entity and Relation extraction to include stricter instructions for generating valid JSON.

Fixed

  • Added schema functions to the documentation.

1.2.1

Added

  • Introduced optional lexical graph configuration for SimpleKGPipeline, enhancing flexibility in customizing node labels and relationship types in the lexical graph.
  • Introduced optional neo4j_database parameter for SimpleKGPipeline, Neo4jChunkReaderand Text2CypherRetriever.
  • Ability to provide description and list of properties for entities and relations in the SimpleKGPipeline constructor.

Fixed

  • neo4j_database parameter is now used for all queries in the Neo4jWriter.

Changed

  • Updated all examples to use neo4j_database parameter instead of an undocumented neo4j driver constructor.
  • All READ queries are now routed to a reader replica (for clusters). This impacts all retrievers, the Neo4jChunkReader and SinglePropertyExactMatchResolver components.

1.2.0

Added

  • Made relations and potential_schema optional in SchemaBuilder.
  • Added a check to prevent the use of deprecated Cypher syntax for Neo4j versions 5.23.0 and above.
  • Added a LexicalGraphBuilder component to enable the import of the lexical graph (document, chunks) without performing entity and relation extraction.
  • Added a Neo4jChunkReader component to be able to read chunk text from the database.

Changed

  • Vector and Hybrid retrievers used with return_properties now also return the node labels (nodeLabels) and the node's element ID (id).
  • HybridRetriever now filters out the embedding property index in self.vector_index_name from the retriever result by default.
  • Removed support for neo4j.AsyncDriver in the KG creation pipeline, affecting Neo4jWriter and related components.
  • Updated examples and unit tests to reflect the removal of async driver support.

Fixed

  • Resolved issue with AzureOpenAIEmbeddings incorrectly inheriting from OpenAIEmbeddings, now inherits from BaseOpenAIEmbeddings.

1.1.0

Added

  • Introduced a fail_if_exist option to index creation functions to control behavior when an index already exists.
  • Added Qdrant retriever in neo4j_graphrag.retrievers.

Changed

  • Comprehensive rewrite of the README to improve clarity and provide detailed usage examples.

1.0.0

Fixed

  • Fix a bug where openai Python client and numpy were required to import any embedder or LLM.

Changed

  • The value associated to the enum field OnError.IGNORE has been changed from "CONTINUE" to "IGNORE" to stick to the convention and match the field name.

Added

  • Added SinglePropertyExactMatchResolver component allowing to merge entities with exact same property (e.g. name)
  • Added the SimpleKGPipeline class, a simplified abstraction layer to streamline knowledge graph building processes from text documents.

1.0.0a1

1.0.0a0

Added

  • Added SinglePropertyExactMatchResolver component allowing to merge entities with exact same property (e.g. name)

0.7.0

Added

  • Added AzureOpenAILLM and AzureOpenAIEmbeddings to support Azure served OpenAI models
  • Added template validation in PromptTemplate class upon construction.
  • Examples demonstrating the use of Mistral embeddings and LLM in RAG pipelines.
  • Added feature to include kwargs in Text2CypherRetriever.search() that will be injected into a custom prompt, if provided.
  • Added validation to custom_prompt parameter of Text2CypherRetriever to ensure that query_text placeholder exists in prompt.
  • Introduced a fixed size text splitter component for splitting text into specified fixed size chunks with overlap. Updated examples and tests to utilize this new component.
  • Introduced Vertex AI LLM class for integrating Vertex AI models.
  • Added unit tests for the Vertex AI LLM class.
  • Added support for Cohere LLM and embeddings - added optional dependency to cohere.
  • Added support for Anthropic LLM - added optional dependency to anthropic.
  • Added support for MistralAI LLM - added optional dependency to mistralai.
  • Added support for Qdrant - added optional dependency to qdrant-client.

Fixed

  • Resolved import issue with the Vertex AI Embeddings class.
  • Fixed bug in Text2CypherRetriever using custom_prompt arg where the search method would not inject the query_text content.
  • custom_prompt arg is now converted to Text2CypherTemplate class within the Text2CypherRetriever.get_search_results method.
  • Text2CypherTemplate and RAGTemplate prompt templates now require query_text arg and will error if it is not present. Previous query_text aliases may be used, but will warn of deprecation.
  • Resolved issue where Neo4jWriter component would raise an error if the start or end node ID was not defined properly in the input.
  • Resolved issue where relationship types was not escaped in the insert Cypher query.
  • Improved query performance in Neo4jWriter: created nodes now have a generic __KGBuilder__ label and an index is created on the __KGBuilder__.id property. Moreover, insertion queries are now batched. Batch size can be controlled using the batch_size parameter in the Neo4jWriter component.

Changed

  • Moved the Embedder class to the neo4j_graphrag.embeddings directory for better organization alongside other custom embedders.
  • Removed query argument from the GraphRAG class' .search method; users must now use query_text.
  • Neo4jWriter component now runs a single query to merge node and set its embeddings if any.
  • Nodes created by the Neo4jWriter now have an extra __KGBuilder__ label. Nodes from the entity graph also have an __Entity__ label.
  • Dropped support for Python 3.8 (end of life).

0.6.3

Changed

  • Updated documentation links in README.
  • Renamed deprecated package references in documentation.

Added

  • Introduction page to the documentation content tree.
  • Introduced a new Vertex AI embeddings class for generating text embeddings using Vertex AI.
  • Updated documentation to include OpenAI and Vertex AI embeddings classes.
  • Added google-cloud-aiplatform as an optional dependency for Vertex AI embeddings.

Fixed

  • Make pygraphviz an optional dependency - it is now only required when calling pipeline.draw.

0.6.2

Fixed

  • Moved pygraphviz to optional dependencies under [tool.poetry.extras] in pyproject.toml to resolve an issue where pip install neo4j-graphrag incorrectly required pygraphviz as a mandatory dependency.

0.6.1

Changed

0.6.0

IMPORTANT NOTICE

  • The neo4j-genai package is now deprecated. Users are advised to switch to the new package neo4j-graphrag.

Added

  • Ability to visualise pipeline with my_pipeline.draw("pipeline.png").
  • LexicalGraphBuilder component to create the lexical graph without entity-relation extraction.

Fixed

  • Pipelines now return correct results when the same pipeline is run in parallel.

Changed

  • Pipeline run method now return a PipelineResult object.
  • Improved parameter validation for pipelines (#124). Pipeline now raise an error before a run starts if:
    • the same parameter is mapped twice
    • or a parameter is defined in the mapping but is not a valid component input

0.5.0

Added

  • PDF-to-graph pipeline for knowledge graph construction in experimental mode
  • Introduced support for Component/Pipeline flexible architecture.
  • Added new components for knowledge graph construction, including text splitters, schema builders, entity-relation extractors, and Neo4j writers.
  • Implemented end-to-end tests for the new knowledge graph builder pipeline.

Changed

  • When saving the lexical graph in a KG creation pipeline, the document is also saved as a specific node, together with relationships between each chunk and the document they were created from.

Fixed

  • Corrected the hybrid retriever query to ensure proper normalization of scores in vector search results.

0.4.0

Added

  • Add optional custom_prompt arg to the Text2CypherRetriever class.

Changed

  • GraphRAG.search method first parameter has been renamed query_text (was query) for consistency with the retrievers interface.
  • Made GraphRAG.search method backwards compatible with the query parameter, raising warnings to encourage using query_text instead.

0.3.1

Fixed

  • Corrected initialization to allow specifying the embedding model name.
  • Removed sentence_transformers from embeddings/init.py to avoid ImportError when the package is not installed.

0.3.0

Added

  • Stopped embeddings from being returned when searching with VectorRetriever. Added nodeLabels and id to the metadata of VectorRetriever results.
  • Added upsert_vector utility function for attaching vectors to node properties.
  • Introduced Neo4jInsertionError for handling insertion failures in Neo4j.
  • Included Pinecone and Weaviate retrievers in neo4j_graphrag.retrievers.
  • Introduced the GraphRAG object, enabling a full RAG (Retrieval-Augmented Generation) pipeline with context retrieval, prompt formatting, and answer generation.
  • Added PromptTemplate and RagTemplate for customizable prompt generation.
  • Added LLMInterface with implementation for OpenAI LLM.
  • Updated project configuration to support multiple Python versions (3.8 to 3.12) in CI workflows.
  • Improved developer experience by copying the docstring from the Retriever.get_search_results method to the Retriever.search method
  • Support for specifying database names in index handling methods and retrievers.
  • User Guide in documentation.
  • Introduced result_formatter argument to all retrievers, allowing custom formatting of retriever results.

Changed

  • Refactored import paths for retrievers to neo4j_graphrag.retrievers.
  • Implemented exception chaining for all re-raised exceptions to improve stack trace readability.
  • Made error messages in index.py more consistent.
  • Renamed Retriever._get_search_results to Retriever.get_search_results
  • Updated retrievers and index handling methods to accept optional database names.

0.2.0

Fixed

  • Removed Pinecone and Weaviate retrievers from init.py to prevent ImportError when optional dependencies are not installed.
  • Moved few-shot examples in Text2CypherRetriever to the constructor for better initialization and usage. Updated unit tests and example script accordingly.
  • Fixed regex warnings in E2E tests for Weaviate and Pinecone retrievers.
  • Corrected HuggingFaceEmbeddings import in E2E tests.

0.2.0a5

0.2.0a3

Added

  • Introduced custom exceptions for improved error handling, including RetrieverInitializationError, SearchValidationError, FilterValidationError, EmbeddingRequiredError, RecordCreationError, Neo4jIndexError, and Neo4jVersionError.
  • Retrievers that integrates with a Weaviate vector database: WeaviateNeo4jRetriever.
  • New return types that help with getting retriever results: RetrieverResult and RetrieverResultItem.
  • Supported wrapper embedder object for sentence-transformers embeddings: SentenceTransformerEmbeddings.
  • Text2CypherRetriever object which allows for the retrieval of records from a Neo4j database using natural language.

Changed

  • Replaced ValueError with custom exceptions across various modules for clearer and more specific error messages.

Fixed

  • Updated documentation to include new custom exceptions.
  • Improved the use of Pydantic for input data validation for retriever objects.