[Security Solution] [Elastic AI Assistant] Hybrid (vector + terms) se…

…arch for improved ES|QL query generation (#168995) ## [Security Solution] [Elastic AI Assistant] Hybrid (vector + terms) search for improved ES|QL query generation This PR implements hybrid (vector + terms) search to improve the quality of `ES|QL` queries generated by the Elastic AI Assistant. The hybrid search combines (from a single request to Elasticsearch): - Vector search results from ELSER that vary depending on the query specified by the user - Terms search results that return a set of Knowledge Base (KB) documents marked as "required" for a topic The hybrid search results, when provided as context to an LLM, improve the quality of generated `ES|QL` queries by combining `ES|QL` parser grammar and documentation specific to the question asked by a user with additional examples of valid `ES|QL` queries that aren't specific to the query. ## Details ### Indexing additional `metadata` The `loadESQL` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/content_loaders/esql_loader.ts` loads a directory containing 13 valid, and one invalid example of `ES|QL` queries: ```typescript const rawExampleQueries = await exampleQueriesLoader.load(); // Add additional metadata to the example queries that indicates they are required KB documents: const requiredExampleQueries = addRequiredKbResourceMetadata({ docs: rawExampleQueries, kbResource: ESQL_RESOURCE, }); ``` The `addRequiredKbResourceMetadata` function adds two additional fields to the `metadata` property of the document: - `kbResource` - a `keyword` field that specifies the category of knowledge, e.g. `esql` - `required` - a `boolean` field that when `true`, indicates the document should be returned in all searches for the `kbResource` The additional metadata fields are shown below in the following abridged sample document: ``` { "_index": ".kibana-elastic-ai-assistant-kb", "_id": "e297e2d9-fb0e-4638-b4be-af31d1b31b9f", "_version": 1, "_seq_no": 129, "_primary_term": 1, "found": true, "_source": { "metadata": { "source": "/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/example_queries/esql_example_query_0001.asciidoc", "required": true, "kbResource": "esql" }, "vector": { "tokens": { "serial": 0.5612584, "syntax": 0.006727545, "user": 1.1184403, // ...additional tokens }, "model_id": ".elser_model_2" }, "text": """[[esql-example-queries]] The following is an example ES|QL query: \`\`\` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) by user.name, host.name | ENRICH ldap_lookup_new ON user.name | WHERE group.name IS NOT NULL | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount desc | KEEP destcount, host.name, user.name, group.name, follow_up \`\`\` """ } } ``` ### Hybrid search The `ElasticsearchStore.similaritySearch` function is invoked by LangChain's `VectorStoreRetriever.getRelevantDocuments` function when the `RetrievalQAChain` searches for documents. A single request to Elasticsearch performs a hybrid search that combines the vector and terms searches into a single request with an [msearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html): ```typescript // requiredDocs is an array of filters that can be used in a `bool` Elasticsearch DSL query to filter in/out required KB documents: const requiredDocs = getRequiredKbDocsTermsQueryDsl(this.kbResource); // The `k` parameter is typically provided by LangChain's `VectorStoreRetriever._getRelevantDocuments`, which calls this function: const vectorSearchQuerySize = k ?? FALLBACK_SIMILARITY_SEARCH_SIZE; // build a vector search query: const vectorSearchQuery = getVectorSearchQuery({ filter, modelId: this.model, mustNotTerms: requiredDocs, query, }); // build a (separate) terms search query: const termsSearchQuery = getTermsSearchQuery(requiredDocs); // combine the vector search query and the terms search queries into a single multi-search query: const mSearchQueryBody = getMsearchQueryBody({ index: this.index, termsSearchQuery, termsSearchQuerySize: TERMS_QUERY_SIZE, vectorSearchQuery, vectorSearchQuerySize, }); try { // execute both queries via a single multi-search request: const result = await this.esClient.msearch<MsearchResponse>(mSearchQueryBody); // flatten the results of the combined queries into a single array of hits: const results: FlattenedHit[] = result.responses.flatMap((response) => // ... ``` ## Desk testing 1. Delete any previous instances of the Knowledge Base by executing the following query in Kibana's `Dev Tools`: ``` DELETE .kibana-elastic-ai-assistant-kb ``` 2. In the Security Solution, open the Elastic AI Assistant 3. In the assistant, click the `Settings` gear 4. Click the `Knowledge Base` icon to view the KB settings 5. Toggle the `Knowledge Base` setting `off` if it's already on 6. Toggle the `Knowledge Base` setting `on` to load the KB documents 7. Click the `Save` button to close settings 8. Enter the following prompt, then press Enter: ``` Generate an ES|QL query that will count the number of connections made to external IP addresses, broken down by user. If the count is greater than 100 for a specific user, add a new field called "follow_up" that contains a value of "true", otherwise, it should contain "false". The user names should also be enriched with their respective group names. ``` **Expected result** A response similar to the following is returned: ``` FROM logs-* | WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16") | STATS destcount = COUNT(destination.ip) BY user.name | ENRICH ldap_lookup ON user.name | EVAL follow_up = CASE( destcount >= 100, "true", "false") | SORT destcount DESC | KEEP destcount, user.name, group.name, follow_up ``` (cherry picked from commit d0e9925)
elastic · Oct 17, 2023 · 3cde407 · 3cde407
1 parent 5047d71
commit 3cde407
Show file tree

Hide file tree

Showing 49 changed files with 1,673 additions and 114 deletions.
diff --git a/x-pack/plugins/elastic_assistant/server/__mocks__/docs_from_directory_loader.ts b/x-pack/plugins/elastic_assistant/server/__mocks__/docs_from_directory_loader.ts
@@ -0,0 +1,74 @@
+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the Elastic License
+ * 2.0; you may not use this file except in compliance with the Elastic License
+ * 2.0.
+ */
+
+import { Document } from 'langchain/document';
+
+/**
+ * Mock LangChain `Document`s from `knowledge_base/esql/docs`, loaded from a LangChain `DirectoryLoader`
+ */
+export const mockEsqlDocsFromDirectoryLoader: Document[] = [
+  {
+    pageContent:
+      '[[esql-agg-avg]]\n=== `AVG`\nThe average of a numeric field.\n\n[source.merge.styled,esql]\n----\ninclude::{esql-specs}/stats.csv-spec[tag=avg]\n----\n[%header.monospaced.styled,format=dsv,separator=|]\n|===\ninclude::{esql-specs}/stats.csv-spec[tag=avg-result]\n|===\n\nThe result is always a `double` not matter the input type.\n',
+    metadata: {
+      source:
+        '/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/aggregation_functions/avg.asciidoc',
+    },
+  },
+];
+
+/**
+ * Mock LangChain `Document`s from `knowledge_base/esql/language_definition`, loaded from a LangChain `DirectoryLoader`
+ */
+export const mockEsqlLanguageDocsFromDirectoryLoader: Document[] = [
+  {
+    pageContent:
+      "lexer grammar EsqlBaseLexer;\n\nDISSECT : 'dissect' -> pushMode(EXPRESSION);\nDROP : 'drop' -> pushMode(SOURCE_IDENTIFIERS);\nENRICH : 'enrich' -> pushMode(SOURCE_IDENTIFIERS);\nEVAL : 'eval' -> pushMode(EXPRESSION);\nEXPLAIN : 'explain' -> pushMode(EXPLAIN_MODE);\nFROM : 'from' -> pushMode(SOURCE_IDENTIFIERS);\nGROK : 'grok' -> pushMode(EXPRESSION);\nINLINESTATS : 'inlinestats' -> pushMode(EXPRESSION);\nKEEP : 'keep' -> pushMode(SOURCE_IDENTIFIERS);\nLIMIT : 'limit' -> pushMode(EXPRESSION);\nMV_EXPAND : 'mv_expand' -> pushMode(SOURCE_IDENTIFIERS);\nPROJECT : 'project' -> pushMode(SOURCE_IDENTIFIERS);\nRENAME : 'rename' -> pushMode(SOURCE_IDENTIFIERS);\nROW : 'row' -> pushMode(EXPRESSION);\nSHOW : 'show' -> pushMode(EXPRESSION);\nSORT : 'sort' -> pushMode(EXPRESSION);\nSTATS : 'stats' -> pushMode(EXPRESSION);\nWHERE : 'where' -> pushMode(EXPRESSION);\nUNKNOWN_CMD : ~[ \\r\\n\\t[\\]/]+ -> pushMode(EXPRESSION);\n\nLINE_COMMENT\n    : '//' ~[\\r\\n]* '\\r'? '\\n'? -> channel(HIDDEN)\n    ;\n\nMULTILINE_COMMENT\n    : '/*' (MULTILINE_COMMENT|.)*? '*/' -> channel(HIDDEN)\n    ;\n\nWS\n    : [ \\r\\n\\t]+ -> channel(HIDDEN)\n    ;\n\n\nmode EXPLAIN_MODE;\nEXPLAIN_OPENING_BRACKET : '[' -> type(OPENING_BRACKET), pushMode(DEFAULT_MODE);\nEXPLAIN_PIPE : '|' -> type(PIPE), popMode;\nEXPLAIN_WS : WS -> channel(HIDDEN);\nEXPLAIN_LINE_COMMENT : LINE_COMMENT -> channel(HIDDEN);\nEXPLAIN_MULTILINE_COMMENT : MULTILINE_COMMENT -> channel(HIDDEN);\n\nmode EXPRESSION;\n\nPIPE : '|' -> popMode;\n\nfragment DIGIT\n    : [0-9]\n    ;\n\nfragment LETTER\n    : [A-Za-z]\n    ;\n\nfragment ESCAPE_SEQUENCE\n    : '\\\\' [tnr\"\\\\]\n    ;\n\nfragment UNESCAPED_CHARS\n    : ~[\\r\\n\"\\\\]\n    ;\n\nfragment EXPONENT\n    : [Ee] [+-]? DIGIT+\n    ;\n\nSTRING\n    : '\"' (ESCAPE_SEQUENCE | UNESCAPED_CHARS)* '\"'\n    | '\"\"\"' (~[\\r\\n])*? '\"\"\"' '\"'? '\"'?\n    ;\n\nINTEGER_LITERAL\n    : DIGIT+\n    ;\n\nDECIMAL_LITERAL\n    : DIGIT+ DOT DIGIT*\n    | DOT DIGIT+\n    | DIGIT+ (DOT DIGIT*)? EXPONENT\n    | DOT DIGIT+ EXPONENT\n    ;\n\nBY : 'by';\n\nAND : 'and';\nASC : 'asc';\nASSIGN : '=';\nCOMMA : ',';\nDESC : 'desc';\nDOT : '.';\nFALSE : 'false';\nFIRST : 'first';\nLAST : 'last';\nLP : '(';\nIN: 'in';\nIS: 'is';\nLIKE: 'like';\nNOT : 'not';\nNULL : 'null';\nNULLS : 'nulls';\nOR : 'or';\nPARAM: '?';\nRLIKE: 'rlike';\nRP : ')';\nTRUE : 'true';\nINFO : 'info';\nFUNCTIONS : 'functions';\n\nEQ  : '==';\nNEQ : '!=';\nLT  : '<';\nLTE : '<=';\nGT  : '>';\nGTE : '>=';\n\nPLUS : '+';\nMINUS : '-';\nASTERISK : '*';\nSLASH : '/';\nPERCENT : '%';\n\n// Brackets are funny. We can happen upon a CLOSING_BRACKET in two ways - one\n// way is to start in an explain command which then shifts us to expression\n// mode. Thus, the two popModes on CLOSING_BRACKET. The other way could as\n// the start of a multivalued field constant. To line up with the double pop\n// the explain mode needs, we double push when we see that.\nOPENING_BRACKET : '[' -> pushMode(EXPRESSION), pushMode(EXPRESSION);\nCLOSING_BRACKET : ']' -> popMode, popMode;\n\n\nUNQUOTED_IDENTIFIER\n    : LETTER (LETTER | DIGIT | '_')*\n    // only allow @ at beginning of identifier to keep the option to allow @ as infix operator in the future\n    // also, single `_` and `@` characters are not valid identifiers\n    | ('_' | '@') (LETTER | DIGIT | '_')+\n    ;\n\nQUOTED_IDENTIFIER\n    : '`' ( ~'`' | '``' )* '`'\n    ;\n\nEXPR_LINE_COMMENT\n    : LINE_COMMENT -> channel(HIDDEN)\n    ;\n\nEXPR_MULTILINE_COMMENT\n    : MULTILINE_COMMENT -> channel(HIDDEN)\n    ;\n\nEXPR_WS\n    : WS -> channel(HIDDEN)\n    ;\n\n\n\nmode SOURCE_IDENTIFIERS;\n\nSRC_PIPE : '|' -> type(PIPE), popMode;\nSRC_OPENING_BRACKET : '[' -> type(OPENING_BRACKET), pushMode(SOURCE_IDENTIFIERS), pushMode(SOURCE_IDENTIFIERS);\nSRC_CLOSING_BRACKET : ']' -> popMode, popMode, type(CLOSING_BRACKET);\nSRC_COMMA : ',' -> type(COMMA);\nSRC_ASSIGN : '=' -> type(ASSIGN);\nAS : 'as';\nMETADATA: 'metadata';\nON : 'on';\nWITH : 'with';\n\nSRC_UNQUOTED_IDENTIFIER\n    : SRC_UNQUOTED_IDENTIFIER_PART+\n    ;\n\nfragment SRC_UNQUOTED_IDENTIFIER_PART\n    : ~[=`|,[\\]/ \\t\\r\\n]+\n    | '/' ~[*/] // allow single / but not followed by another / or * which would start a comment\n    ;\n\nSRC_QUOTED_IDENTIFIER\n    : QUOTED_IDENTIFIER\n    ;\n\nSRC_LINE_COMMENT\n    : LINE_COMMENT -> channel(HIDDEN)\n    ;\n\nSRC_MULTILINE_COMMENT\n    : MULTILINE_COMMENT -> channel(HIDDEN)\n    ;\n\nSRC_WS\n    : WS -> channel(HIDDEN)\n    ;\n",
+    metadata: {
+      source:
+        '/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/language_definition/esql_base_lexer.g4',
+    },
+  },
+  {
+    pageContent:
+      "DISSECT=1\nDROP=2\nENRICH=3\nEVAL=4\nEXPLAIN=5\nFROM=6\nGROK=7\nINLINESTATS=8\nKEEP=9\nLIMIT=10\nMV_EXPAND=11\nPROJECT=12\nRENAME=13\nROW=14\nSHOW=15\nSORT=16\nSTATS=17\nWHERE=18\nUNKNOWN_CMD=19\nLINE_COMMENT=20\nMULTILINE_COMMENT=21\nWS=22\nEXPLAIN_WS=23\nEXPLAIN_LINE_COMMENT=24\nEXPLAIN_MULTILINE_COMMENT=25\nPIPE=26\nSTRING=27\nINTEGER_LITERAL=28\nDECIMAL_LITERAL=29\nBY=30\nAND=31\nASC=32\nASSIGN=33\nCOMMA=34\nDESC=35\nDOT=36\nFALSE=37\nFIRST=38\nLAST=39\nLP=40\nIN=41\nIS=42\nLIKE=43\nNOT=44\nNULL=45\nNULLS=46\nOR=47\nPARAM=48\nRLIKE=49\nRP=50\nTRUE=51\nINFO=52\nFUNCTIONS=53\nEQ=54\nNEQ=55\nLT=56\nLTE=57\nGT=58\nGTE=59\nPLUS=60\nMINUS=61\nASTERISK=62\nSLASH=63\nPERCENT=64\nOPENING_BRACKET=65\nCLOSING_BRACKET=66\nUNQUOTED_IDENTIFIER=67\nQUOTED_IDENTIFIER=68\nEXPR_LINE_COMMENT=69\nEXPR_MULTILINE_COMMENT=70\nEXPR_WS=71\nAS=72\nMETADATA=73\nON=74\nWITH=75\nSRC_UNQUOTED_IDENTIFIER=76\nSRC_QUOTED_IDENTIFIER=77\nSRC_LINE_COMMENT=78\nSRC_MULTILINE_COMMENT=79\nSRC_WS=80\nEXPLAIN_PIPE=81\n'dissect'=1\n'drop'=2\n'enrich'=3\n'eval'=4\n'explain'=5\n'from'=6\n'grok'=7\n'inlinestats'=8\n'keep'=9\n'limit'=10\n'mv_expand'=11\n'project'=12\n'rename'=13\n'row'=14\n'show'=15\n'sort'=16\n'stats'=17\n'where'=18\n'by'=30\n'and'=31\n'asc'=32\n'desc'=35\n'.'=36\n'false'=37\n'first'=38\n'last'=39\n'('=40\n'in'=41\n'is'=42\n'like'=43\n'not'=44\n'null'=45\n'nulls'=46\n'or'=47\n'?'=48\n'rlike'=49\n')'=50\n'true'=51\n'info'=52\n'functions'=53\n'=='=54\n'!='=55\n'<'=56\n'<='=57\n'>'=58\n'>='=59\n'+'=60\n'-'=61\n'*'=62\n'/'=63\n'%'=64\n']'=66\n'as'=72\n'metadata'=73\n'on'=74\n'with'=75\n",
+    metadata: {
+      source:
+        '/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/language_definition/esql_base_lexer.tokens',
+    },
+  },
+];
+
+/**
+ * Mock LangChain `Document`s from `knowledge_base/esql/example_queries`, loaded from a LangChain `DirectoryLoader`
+ */
+export const mockExampleQueryDocsFromDirectoryLoader: Document[] = [
+  {
+    pageContent:
+      '[[esql-example-queries]]\n\nThe following is an example an ES|QL query:\n\n```\nFROM logs-*\n| WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16")\n| STATS destcount = COUNT(destination.ip) by user.name, host.name\n| ENRICH ldap_lookup_new ON user.name\n| WHERE group.name IS NOT NULL\n| EVAL follow_up = CASE(\n    destcount >= 100, "true",\n     "false")\n| SORT destcount desc\n| KEEP destcount, host.name, user.name, group.name, follow_up\n```\n',
+    metadata: {
+      source:
+        '/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/example_queries/esql_example_query_0001.asciidoc',
+    },
+  },
+  {
+    pageContent:
+      '[[esql-example-queries]]\n\nThe following is an example an ES|QL query:\n\n```\nfrom logs-*\n| grok dns.question.name "%{DATA}\\\\.%{GREEDYDATA:dns.question.registered_domain:string}"\n| stats unique_queries = count_distinct(dns.question.name) by dns.question.registered_domain, process.name\n| where unique_queries > 5\n| sort unique_queries desc\n```\n',
+    metadata: {
+      source:
+        '/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/example_queries/esql_example_query_0002.asciidoc',
+    },
+  },
+  {
+    pageContent:
+      '[[esql-example-queries]]\n\nThe following is an example an ES|QL query:\n\n```\nfrom logs-*\n| where event.code is not null\n| stats event_code_count = count(event.code) by event.code,host.name\n| enrich win_events on event.code with EVENT_DESCRIPTION\n| where EVENT_DESCRIPTION is not null and host.name is not null\n| rename EVENT_DESCRIPTION as event.description\n| sort event_code_count desc\n| keep event_code_count,event.code,host.name,event.description\n```\n',
+    metadata: {
+      source:
+        '/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/example_queries/esql_example_query_0003.asciidoc',
+    },
+  },
+];
diff --git a/x-pack/plugins/elastic_assistant/server/__mocks__/msearch_query.ts b/x-pack/plugins/elastic_assistant/server/__mocks__/msearch_query.ts
@@ -0,0 +1,75 @@
+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the Elastic License
+ * 2.0; you may not use this file except in compliance with the Elastic License
+ * 2.0.
+ */
+
+import type { QueryDslTextExpansionQuery } from '@elastic/elasticsearch/lib/api/types';
+
+import type { MsearchQueryBody } from '../lib/langchain/elasticsearch_store/helpers/get_msearch_query_body';
+
+/**
+ * This mock Elasticsearch msearch request body contains two queries:
+ * - The first query is a similarity (vector) search
+ * - The second query is a required KB document (terms) search
+ */
+export const mSearchQueryBody: MsearchQueryBody = {
+  body: [
+    {
+      index: '.kibana-elastic-ai-assistant-kb',
+    },
+    {
+      query: {
+        bool: {
+          must_not: [
+            {
+              term: {
+                'metadata.kbResource': 'esql',
+              },
+            },
+            {
+              term: {
+                'metadata.required': true,
+              },
+            },
+          ],
+          must: [
+            {
+              text_expansion: {
+                'vector.tokens': {
+                  model_id: '.elser_model_2',
+                  model_text:
+                    'Generate an ESQL query that will count the number of connections made to external IP addresses, broken down by user. If the count is greater than 100 for a specific user, add a new field called "follow_up" that contains a value of "true", otherwise, it should contain "false". The user names should also be enriched with their respective group names.',
+                },
+              } as unknown as QueryDslTextExpansionQuery,
+            },
+          ],
+        },
+      },
+      size: 1,
+    },
+    {
+      index: '.kibana-elastic-ai-assistant-kb',
+    },
+    {
+      query: {
+        bool: {
+          must: [
+            {
+              term: {
+                'metadata.kbResource': 'esql',
+              },
+            },
+            {
+              term: {
+                'metadata.required': true,
+              },
+            },
+          ],
+        },
+      },
+      size: 1,
+    },
+  ],
+};
diff --git a/x-pack/plugins/elastic_assistant/server/__mocks__/msearch_response.ts b/x-pack/plugins/elastic_assistant/server/__mocks__/msearch_response.ts
@@ -0,0 +1,101 @@
+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the Elastic License
+ * 2.0; you may not use this file except in compliance with the Elastic License
+ * 2.0.
+ */
+
+import type { MsearchResponse } from '@elastic/elasticsearch/lib/api/types';
+
+/**
+ * This mock response from an Elasticsearch msearch contains two hits, where
+ * the first hit is from a similarity (vector) search, and the second hit is a
+ * required KB document (terms) search.
+ */
+export const mockMsearchResponse: MsearchResponse = {
+  took: 142,
+  responses: [
+    {
+      took: 142,
+      timed_out: false,
+      _shards: {
+        total: 1,
+        successful: 1,
+        skipped: 0,
+        failed: 0,
+      },
+      hits: {
+        total: {
+          value: 129,
+          relation: 'eq',
+        },
+        max_score: 21.658352,
+        hits: [
+          {
+            _index: '.kibana-elastic-ai-assistant-kb',
+            _id: 'fa1c8ba1-25c9-4404-9736-09b7eb7124f8',
+            _score: 21.658352,
+            _ignored: ['text.keyword'],
+            _source: {
+              metadata: {
+                source:
+                  '/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/source_commands/from.asciidoc',
+              },
+              vector: {
+                tokens: {
+                  wild: 1.2001507,
+                  // truncated for mock
+                },
+                model_id: '.elser_model_2',
+              },
+              text: "[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n",
+            },
+          },
+        ],
+      },
+      status: 200,
+    },
+    {
+      took: 3,
+      timed_out: false,
+      _shards: {
+        total: 1,
+        successful: 1,
+        skipped: 0,
+        failed: 0,
+      },
+      hits: {
+        total: {
+          value: 14,
+          relation: 'eq',
+        },
+        max_score: 0.034783483,
+        hits: [
+          {
+            _index: '.kibana-elastic-ai-assistant-kb',
+            _id: '280d4882-0f64-4471-a268-669a3f8c958f',
+            _score: 0.034783483,
+            _ignored: ['text.keyword'],
+            _source: {
+              metadata: {
+                source:
+                  '/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/example_queries/esql_example_query_0001.asciidoc',
+                required: true,
+                kbResource: 'esql',
+              },
+              vector: {
+                tokens: {
+                  user: 1.1084619,
+                  // truncated for mock
+                },
+                model_id: '.elser_model_2',
+              },
+              text: '[[esql-example-queries]]\n\nThe following is an example an ES|QL query:\n\n```\nFROM logs-*\n| WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16")\n| STATS destcount = COUNT(destination.ip) by user.name, host.name\n| ENRICH ldap_lookup_new ON user.name\n| WHERE group.name IS NOT NULL\n| EVAL follow_up = CASE(\n    destcount >= 100, "true",\n     "false")\n| SORT destcount desc\n| KEEP destcount, host.name, user.name, group.name, follow_up\n```\n',
+            },
+          },
+        ],
+      },
+      status: 200,
+    },
+  ],
+};
diff --git a/x-pack/plugins/elastic_assistant/server/__mocks__/query_text.ts b/x-pack/plugins/elastic_assistant/server/__mocks__/query_text.ts
@@ -0,0 +1,28 @@
+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the Elastic License
+ * 2.0; you may not use this file except in compliance with the Elastic License
+ * 2.0.
+ */
+
+/**
+ * This mock query text is an example of a prompt that might be passed to
+ * the `ElasticSearchStore`'s `similaritySearch` function, as the `query`
+ * parameter.
+ *
+ * In the real world, an LLM extracted the `mockQueryText` from the
+ * following prompt, which includes a system prompt:
+ *
+ * ```
+ * You are a helpful, expert assistant who answers questions about Elastic Security. Do not answer questions unrelated to Elastic Security.
+ * If you answer a question related to KQL, EQL, or ES|QL, it should be immediately usable within an Elastic Security timeline; please always format the output correctly with back ticks. Any answer provided for Query DSL should also be usable in a security timeline. This means you should only ever include the "filter" portion of the query.
+ *
+ * Use the following context to answer questions:
+ *
+ * Generate an ES|QL query that will count the number of connections made to external IP addresses, broken down by user. If the count is greater than 100 for a specific user, add a new field called "follow_up" that contains a value of "true", otherwise, it should contain "false". The user names should also be enriched with their respective group names.
+ * ```
+ *
+ * In the example above, the LLM omitted the system prompt, such that only `mockQueryText` is passed to the `similaritySearch` function.
+ */
+export const mockQueryText =
+  'Generate an ES|QL query that will count the number of connections made to external IP addresses, broken down by user. If the count is greater than 100 for a specific user, add a new field called follow_up that contains a value of true, otherwise, it should contain false. The user names should also be enriched with their respective group names.';
diff --git a/x-pack/plugins/elastic_assistant/server/__mocks__/terms.ts b/x-pack/plugins/elastic_assistant/server/__mocks__/terms.ts
@@ -0,0 +1,28 @@
+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the Elastic License
+ * 2.0; you may not use this file except in compliance with the Elastic License
+ * 2.0.
+ */
+
+import type { Field, FieldValue, QueryDslTermQuery } from '@elastic/elasticsearch/lib/api/types';
+
+/**
+ * These (mock) terms may be used in multiple queries.
+ *
+ * For example, it may be be used in a vector search to exclude the required `esql` KB docs.
+ *
+ * It may also be used in a terms search to find all of the required `esql` KB docs.
+ */
+export const mockTerms: Array<Partial<Record<Field, QueryDslTermQuery | FieldValue>>> = [
+  {
+    term: {
+      'metadata.kbResource': 'esql',
+    },
+  },
+  {
+    term: {
+      'metadata.required': true,
+    },
+  },
+];
diff --git a/x-pack/plugins/elastic_assistant/server/__mocks__/terms_search_query.ts b/x-pack/plugins/elastic_assistant/server/__mocks__/terms_search_query.ts
@@ -0,0 +1,28 @@
+/*
+ * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+ * or more contributor license agreements. Licensed under the Elastic License
+ * 2.0; you may not use this file except in compliance with the Elastic License
+ * 2.0.
+ */
+
+import type { QueryDslQueryContainer } from '@elastic/elasticsearch/lib/api/types';
+
+/**
+ * This Elasticsearch query DSL is a terms search for required `esql` KB docs
+ */
+export const mockTermsSearchQuery: QueryDslQueryContainer = {
+  bool: {
+    must: [
+      {
+        term: {
+          'metadata.kbResource': 'esql',
+        },
+      },
+      {
+        term: {
+          'metadata.required': true,
+        },
+      },
+    ],
+  },
+};