-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[8.11] [Security Solution] [Elastic AI Assistant] Hybrid (vector + terms) search for improved ES|QL query generation (#168995) #169054
Merged
kibanamachine
merged 1 commit into
elastic:8.11
from
kibanamachine:backport/8.11/pr-168995
Oct 17, 2023
Merged
[8.11] [Security Solution] [Elastic AI Assistant] Hybrid (vector + terms) search for improved ES|QL query generation (#168995) #169054
kibanamachine
merged 1 commit into
elastic:8.11
from
kibanamachine:backport/8.11/pr-168995
Oct 17, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…arch for improved ES|QL query generation (elastic#168995) ## [Security Solution] [Elastic AI Assistant] Hybrid (vector + terms) search for improved ES|QL query generation This PR implements hybrid (vector + terms) search to improve the quality of `ES|QL` queries generated by the Elastic AI Assistant. The hybrid search combines (from a single request to Elasticsearch): - Vector search results from ELSER that vary depending on the query specified by the user - Terms search results that return a set of Knowledge Base (KB) documents marked as "required" for a topic The hybrid search results, when provided as context to an LLM, improve the quality of generated `ES|QL` queries by combining `ES|QL` parser grammar and documentation specific to the question asked by a user with additional examples of valid `ES|QL` queries that aren't specific to the query. ## Details ### Indexing additional `metadata` The `loadESQL` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/content_loaders/esql_loader.ts` loads a directory containing 13 valid, and one invalid example of `ES|QL` queries: ```typescript const rawExampleQueries = await exampleQueriesLoader.load(); // Add additional metadata to the example queries that indicates they are required KB documents: const requiredExampleQueries = addRequiredKbResourceMetadata({ docs: rawExampleQueries, kbResource: ESQL_RESOURCE, }); ``` The `addRequiredKbResourceMetadata` function adds two additional fields to the `metadata` property of the document: - `kbResource` - a `keyword` field that specifies the category of knowledge, e.g. `esql` - `required` - a `boolean` field that when `true`, indicates the document should be returned in all searches for the `kbResource` The additional metadata fields are shown below in the following abridged sample document: ``` { "_index": ".kibana-elastic-ai-assistant-kb", "_id": "e297e2d9-fb0e-4638-b4be-af31d1b31b9f", "_version": 1, "_seq_no": 129, "_primary_term": 1, "found": true, "_source": { "metadata": { "source": "/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/example_queries/esql_example_query_0001.asciidoc", "required": true, "kbResource": "esql" }, "vector": { "tokens": { "serial": 0.5612584, "syntax": 0.006727545, "user": 1.1184403, // ...additional tokens }, "model_id": ".elser_model_2" }, "text": """[[esql-example-queries]] The following is an example ES|QL query: \`\`\` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) by user.name, host.name | ENRICH ldap_lookup_new ON user.name | WHERE group.name IS NOT NULL | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount desc | KEEP destcount, host.name, user.name, group.name, follow_up \`\`\` """ } } ``` ### Hybrid search The `ElasticsearchStore.similaritySearch` function is invoked by LangChain's `VectorStoreRetriever.getRelevantDocuments` function when the `RetrievalQAChain` searches for documents. A single request to Elasticsearch performs a hybrid search that combines the vector and terms searches into a single request with an [msearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html): ```typescript // requiredDocs is an array of filters that can be used in a `bool` Elasticsearch DSL query to filter in/out required KB documents: const requiredDocs = getRequiredKbDocsTermsQueryDsl(this.kbResource); // The `k` parameter is typically provided by LangChain's `VectorStoreRetriever._getRelevantDocuments`, which calls this function: const vectorSearchQuerySize = k ?? FALLBACK_SIMILARITY_SEARCH_SIZE; // build a vector search query: const vectorSearchQuery = getVectorSearchQuery({ filter, modelId: this.model, mustNotTerms: requiredDocs, query, }); // build a (separate) terms search query: const termsSearchQuery = getTermsSearchQuery(requiredDocs); // combine the vector search query and the terms search queries into a single multi-search query: const mSearchQueryBody = getMsearchQueryBody({ index: this.index, termsSearchQuery, termsSearchQuerySize: TERMS_QUERY_SIZE, vectorSearchQuery, vectorSearchQuerySize, }); try { // execute both queries via a single multi-search request: const result = await this.esClient.msearch<MsearchResponse>(mSearchQueryBody); // flatten the results of the combined queries into a single array of hits: const results: FlattenedHit[] = result.responses.flatMap((response) => // ... ``` ## Desk testing 1. Delete any previous instances of the Knowledge Base by executing the following query in Kibana's `Dev Tools`: ``` DELETE .kibana-elastic-ai-assistant-kb ``` 2. In the Security Solution, open the Elastic AI Assistant 3. In the assistant, click the `Settings` gear 4. Click the `Knowledge Base` icon to view the KB settings 5. Toggle the `Knowledge Base` setting `off` if it's already on 6. Toggle the `Knowledge Base` setting `on` to load the KB documents 7. Click the `Save` button to close settings 8. Enter the following prompt, then press Enter: ``` Generate an ES|QL query that will count the number of connections made to external IP addresses, broken down by user. If the count is greater than 100 for a specific user, add a new field called "follow_up" that contains a value of "true", otherwise, it should contain "false". The user names should also be enriched with their respective group names. ``` **Expected result** A response similar to the following is returned: ``` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) BY user.name | ENRICH ldap_lookup ON user.name | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount DESC | KEEP destcount, user.name, group.name, follow_up ``` (cherry picked from commit d0e9925)
Documentation preview: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport
This will backport the following commits from
main
to8.11
:Questions ?
Please refer to the Backport tool documentation