Skip to content

Commit

Permalink
[Security Solution] [Elastic AI Assistant] Hybrid (vector + terms) se…
Browse files Browse the repository at this point in the history
…arch for improved ES|QL query generation (#168995)

## [Security Solution] [Elastic AI Assistant] Hybrid (vector + terms) search for improved ES|QL query generation

This PR implements hybrid (vector + terms) search to improve the quality of `ES|QL` queries generated by the Elastic AI Assistant.

The hybrid search combines (from a single request to Elasticsearch):

- Vector search results from ELSER that vary depending on the query specified by the user
- Terms search results that return a set of Knowledge Base (KB) documents marked as "required" for a topic

The hybrid search results, when provided as context to an LLM, improve the quality of generated `ES|QL` queries by combining `ES|QL` parser grammar and documentation specific to the question asked by a user with additional examples of valid `ES|QL` queries that aren't specific to the query.

## Details

### Indexing additional `metadata`

The `loadESQL` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/content_loaders/esql_loader.ts` loads a directory containing 13 valid, and one invalid example of `ES|QL` queries:

```typescript
    const rawExampleQueries = await exampleQueriesLoader.load();

    // Add additional metadata to the example queries that indicates they are required KB documents:
    const requiredExampleQueries = addRequiredKbResourceMetadata({
      docs: rawExampleQueries,
      kbResource: ESQL_RESOURCE,
    });
```

The `addRequiredKbResourceMetadata` function adds two additional fields to the `metadata` property of the document:

- `kbResource` - a `keyword` field that specifies the category of knowledge, e.g. `esql`
- `required` - a `boolean` field that when `true`, indicates the document should be returned in all searches for the `kbResource`

The additional metadata fields are shown below in the following abridged sample document:

```
{
  "_index": ".kibana-elastic-ai-assistant-kb",
  "_id": "e297e2d9-fb0e-4638-b4be-af31d1b31b9f",
  "_version": 1,
  "_seq_no": 129,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "metadata": {
      "source": "/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/example_queries/esql_example_query_0001.asciidoc",
      "required": true,
      "kbResource": "esql"
    },
    "vector": {
      "tokens": {
        "serial": 0.5612584,
        "syntax": 0.006727545,
        "user": 1.1184403,
        // ...additional tokens
      },
      "model_id": ".elser_model_2"
    },
    "text": """[[esql-example-queries]]

The following is an example ES|QL query:

\`\`\`
FROM logs-*
| WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16")
| STATS destcount = COUNT(destination.ip) by user.name, host.name
| ENRICH ldap_lookup_new ON user.name
| WHERE group.name IS NOT NULL
| EVAL follow_up = CASE(
    destcount >= 100, "true",
     "false")
| SORT destcount desc
| KEEP destcount, host.name, user.name, group.name, follow_up
\`\`\`
"""
  }
}
```

### Hybrid search

The `ElasticsearchStore.similaritySearch` function is invoked by LangChain's `VectorStoreRetriever.getRelevantDocuments` function when the `RetrievalQAChain` searches for documents.

A single request to Elasticsearch performs a hybrid search that combines the vector and terms searches into a single request with an [msearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html):

```typescript
    // requiredDocs is an array of filters that can be used in a `bool` Elasticsearch DSL query to filter in/out required KB documents:
    const requiredDocs = getRequiredKbDocsTermsQueryDsl(this.kbResource);

    // The `k` parameter is typically provided by LangChain's `VectorStoreRetriever._getRelevantDocuments`, which calls this function:
    const vectorSearchQuerySize = k ?? FALLBACK_SIMILARITY_SEARCH_SIZE;

    // build a vector search query:
    const vectorSearchQuery = getVectorSearchQuery({
      filter,
      modelId: this.model,
      mustNotTerms: requiredDocs,
      query,
    });

    // build a (separate) terms search query:
    const termsSearchQuery = getTermsSearchQuery(requiredDocs);

    // combine the vector search query and the terms search queries into a single multi-search query:
    const mSearchQueryBody = getMsearchQueryBody({
      index: this.index,
      termsSearchQuery,
      termsSearchQuerySize: TERMS_QUERY_SIZE,
      vectorSearchQuery,
      vectorSearchQuerySize,
    });

    try {
      // execute both queries via a single multi-search request:
      const result = await this.esClient.msearch<MsearchResponse>(mSearchQueryBody);

      // flatten the results of the combined queries into a single array of hits:
      const results: FlattenedHit[] = result.responses.flatMap((response) =>
      // ...
```

## Desk testing

1. Delete any previous instances of the Knowledge Base by executing the following query in Kibana's `Dev Tools`:

```

DELETE .kibana-elastic-ai-assistant-kb

```

2. In the Security Solution, open the Elastic AI Assistant

3. In the assistant, click the `Settings` gear

4. Click the `Knowledge Base` icon to view the KB settings

5. Toggle the `Knowledge Base` setting `off` if it's already on

6. Toggle the `Knowledge Base` setting `on` to load the KB documents

7. Click the `Save` button to close settings

8. Enter the following prompt, then press Enter:

```
Generate an ES|QL query that will count the number of connections made to external IP addresses, broken down by user. If the count is greater than 100 for a specific user, add a new field called "follow_up" that contains a value of "true", otherwise, it should contain "false". The user names should also be enriched with their respective group names.
```

**Expected result**

A response similar to the following is returned:

```
FROM logs-*
| WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16")
| STATS destcount = COUNT(destination.ip) BY user.name
| ENRICH ldap_lookup ON user.name
| EVAL follow_up = CASE(
    destcount >= 100, "true",
    "false")
| SORT destcount DESC
| KEEP destcount, user.name, group.name, follow_up
```

(cherry picked from commit d0e9925)
  • Loading branch information
andrew-goldstein committed Oct 17, 2023
1 parent 5047d71 commit 3cde407
Show file tree
Hide file tree
Showing 49 changed files with 1,673 additions and 114 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

import { Document } from 'langchain/document';

/**
* Mock LangChain `Document`s from `knowledge_base/esql/docs`, loaded from a LangChain `DirectoryLoader`
*/
export const mockEsqlDocsFromDirectoryLoader: Document[] = [
{
pageContent:
'[[esql-agg-avg]]\n=== `AVG`\nThe average of a numeric field.\n\n[source.merge.styled,esql]\n----\ninclude::{esql-specs}/stats.csv-spec[tag=avg]\n----\n[%header.monospaced.styled,format=dsv,separator=|]\n|===\ninclude::{esql-specs}/stats.csv-spec[tag=avg-result]\n|===\n\nThe result is always a `double` not matter the input type.\n',
metadata: {
source:
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/aggregation_functions/avg.asciidoc',
},
},
];

/**
* Mock LangChain `Document`s from `knowledge_base/esql/language_definition`, loaded from a LangChain `DirectoryLoader`
*/
export const mockEsqlLanguageDocsFromDirectoryLoader: Document[] = [
{
pageContent:
"lexer grammar EsqlBaseLexer;\n\nDISSECT : 'dissect' -> pushMode(EXPRESSION);\nDROP : 'drop' -> pushMode(SOURCE_IDENTIFIERS);\nENRICH : 'enrich' -> pushMode(SOURCE_IDENTIFIERS);\nEVAL : 'eval' -> pushMode(EXPRESSION);\nEXPLAIN : 'explain' -> pushMode(EXPLAIN_MODE);\nFROM : 'from' -> pushMode(SOURCE_IDENTIFIERS);\nGROK : 'grok' -> pushMode(EXPRESSION);\nINLINESTATS : 'inlinestats' -> pushMode(EXPRESSION);\nKEEP : 'keep' -> pushMode(SOURCE_IDENTIFIERS);\nLIMIT : 'limit' -> pushMode(EXPRESSION);\nMV_EXPAND : 'mv_expand' -> pushMode(SOURCE_IDENTIFIERS);\nPROJECT : 'project' -> pushMode(SOURCE_IDENTIFIERS);\nRENAME : 'rename' -> pushMode(SOURCE_IDENTIFIERS);\nROW : 'row' -> pushMode(EXPRESSION);\nSHOW : 'show' -> pushMode(EXPRESSION);\nSORT : 'sort' -> pushMode(EXPRESSION);\nSTATS : 'stats' -> pushMode(EXPRESSION);\nWHERE : 'where' -> pushMode(EXPRESSION);\nUNKNOWN_CMD : ~[ \\r\\n\\t[\\]/]+ -> pushMode(EXPRESSION);\n\nLINE_COMMENT\n : '//' ~[\\r\\n]* '\\r'? '\\n'? -> channel(HIDDEN)\n ;\n\nMULTILINE_COMMENT\n : '/*' (MULTILINE_COMMENT|.)*? '*/' -> channel(HIDDEN)\n ;\n\nWS\n : [ \\r\\n\\t]+ -> channel(HIDDEN)\n ;\n\n\nmode EXPLAIN_MODE;\nEXPLAIN_OPENING_BRACKET : '[' -> type(OPENING_BRACKET), pushMode(DEFAULT_MODE);\nEXPLAIN_PIPE : '|' -> type(PIPE), popMode;\nEXPLAIN_WS : WS -> channel(HIDDEN);\nEXPLAIN_LINE_COMMENT : LINE_COMMENT -> channel(HIDDEN);\nEXPLAIN_MULTILINE_COMMENT : MULTILINE_COMMENT -> channel(HIDDEN);\n\nmode EXPRESSION;\n\nPIPE : '|' -> popMode;\n\nfragment DIGIT\n : [0-9]\n ;\n\nfragment LETTER\n : [A-Za-z]\n ;\n\nfragment ESCAPE_SEQUENCE\n : '\\\\' [tnr\"\\\\]\n ;\n\nfragment UNESCAPED_CHARS\n : ~[\\r\\n\"\\\\]\n ;\n\nfragment EXPONENT\n : [Ee] [+-]? DIGIT+\n ;\n\nSTRING\n : '\"' (ESCAPE_SEQUENCE | UNESCAPED_CHARS)* '\"'\n | '\"\"\"' (~[\\r\\n])*? '\"\"\"' '\"'? '\"'?\n ;\n\nINTEGER_LITERAL\n : DIGIT+\n ;\n\nDECIMAL_LITERAL\n : DIGIT+ DOT DIGIT*\n | DOT DIGIT+\n | DIGIT+ (DOT DIGIT*)? EXPONENT\n | DOT DIGIT+ EXPONENT\n ;\n\nBY : 'by';\n\nAND : 'and';\nASC : 'asc';\nASSIGN : '=';\nCOMMA : ',';\nDESC : 'desc';\nDOT : '.';\nFALSE : 'false';\nFIRST : 'first';\nLAST : 'last';\nLP : '(';\nIN: 'in';\nIS: 'is';\nLIKE: 'like';\nNOT : 'not';\nNULL : 'null';\nNULLS : 'nulls';\nOR : 'or';\nPARAM: '?';\nRLIKE: 'rlike';\nRP : ')';\nTRUE : 'true';\nINFO : 'info';\nFUNCTIONS : 'functions';\n\nEQ : '==';\nNEQ : '!=';\nLT : '<';\nLTE : '<=';\nGT : '>';\nGTE : '>=';\n\nPLUS : '+';\nMINUS : '-';\nASTERISK : '*';\nSLASH : '/';\nPERCENT : '%';\n\n// Brackets are funny. We can happen upon a CLOSING_BRACKET in two ways - one\n// way is to start in an explain command which then shifts us to expression\n// mode. Thus, the two popModes on CLOSING_BRACKET. The other way could as\n// the start of a multivalued field constant. To line up with the double pop\n// the explain mode needs, we double push when we see that.\nOPENING_BRACKET : '[' -> pushMode(EXPRESSION), pushMode(EXPRESSION);\nCLOSING_BRACKET : ']' -> popMode, popMode;\n\n\nUNQUOTED_IDENTIFIER\n : LETTER (LETTER | DIGIT | '_')*\n // only allow @ at beginning of identifier to keep the option to allow @ as infix operator in the future\n // also, single `_` and `@` characters are not valid identifiers\n | ('_' | '@') (LETTER | DIGIT | '_')+\n ;\n\nQUOTED_IDENTIFIER\n : '`' ( ~'`' | '``' )* '`'\n ;\n\nEXPR_LINE_COMMENT\n : LINE_COMMENT -> channel(HIDDEN)\n ;\n\nEXPR_MULTILINE_COMMENT\n : MULTILINE_COMMENT -> channel(HIDDEN)\n ;\n\nEXPR_WS\n : WS -> channel(HIDDEN)\n ;\n\n\n\nmode SOURCE_IDENTIFIERS;\n\nSRC_PIPE : '|' -> type(PIPE), popMode;\nSRC_OPENING_BRACKET : '[' -> type(OPENING_BRACKET), pushMode(SOURCE_IDENTIFIERS), pushMode(SOURCE_IDENTIFIERS);\nSRC_CLOSING_BRACKET : ']' -> popMode, popMode, type(CLOSING_BRACKET);\nSRC_COMMA : ',' -> type(COMMA);\nSRC_ASSIGN : '=' -> type(ASSIGN);\nAS : 'as';\nMETADATA: 'metadata';\nON : 'on';\nWITH : 'with';\n\nSRC_UNQUOTED_IDENTIFIER\n : SRC_UNQUOTED_IDENTIFIER_PART+\n ;\n\nfragment SRC_UNQUOTED_IDENTIFIER_PART\n : ~[=`|,[\\]/ \\t\\r\\n]+\n | '/' ~[*/] // allow single / but not followed by another / or * which would start a comment\n ;\n\nSRC_QUOTED_IDENTIFIER\n : QUOTED_IDENTIFIER\n ;\n\nSRC_LINE_COMMENT\n : LINE_COMMENT -> channel(HIDDEN)\n ;\n\nSRC_MULTILINE_COMMENT\n : MULTILINE_COMMENT -> channel(HIDDEN)\n ;\n\nSRC_WS\n : WS -> channel(HIDDEN)\n ;\n",
metadata: {
source:
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/language_definition/esql_base_lexer.g4',
},
},
{
pageContent:
"DISSECT=1\nDROP=2\nENRICH=3\nEVAL=4\nEXPLAIN=5\nFROM=6\nGROK=7\nINLINESTATS=8\nKEEP=9\nLIMIT=10\nMV_EXPAND=11\nPROJECT=12\nRENAME=13\nROW=14\nSHOW=15\nSORT=16\nSTATS=17\nWHERE=18\nUNKNOWN_CMD=19\nLINE_COMMENT=20\nMULTILINE_COMMENT=21\nWS=22\nEXPLAIN_WS=23\nEXPLAIN_LINE_COMMENT=24\nEXPLAIN_MULTILINE_COMMENT=25\nPIPE=26\nSTRING=27\nINTEGER_LITERAL=28\nDECIMAL_LITERAL=29\nBY=30\nAND=31\nASC=32\nASSIGN=33\nCOMMA=34\nDESC=35\nDOT=36\nFALSE=37\nFIRST=38\nLAST=39\nLP=40\nIN=41\nIS=42\nLIKE=43\nNOT=44\nNULL=45\nNULLS=46\nOR=47\nPARAM=48\nRLIKE=49\nRP=50\nTRUE=51\nINFO=52\nFUNCTIONS=53\nEQ=54\nNEQ=55\nLT=56\nLTE=57\nGT=58\nGTE=59\nPLUS=60\nMINUS=61\nASTERISK=62\nSLASH=63\nPERCENT=64\nOPENING_BRACKET=65\nCLOSING_BRACKET=66\nUNQUOTED_IDENTIFIER=67\nQUOTED_IDENTIFIER=68\nEXPR_LINE_COMMENT=69\nEXPR_MULTILINE_COMMENT=70\nEXPR_WS=71\nAS=72\nMETADATA=73\nON=74\nWITH=75\nSRC_UNQUOTED_IDENTIFIER=76\nSRC_QUOTED_IDENTIFIER=77\nSRC_LINE_COMMENT=78\nSRC_MULTILINE_COMMENT=79\nSRC_WS=80\nEXPLAIN_PIPE=81\n'dissect'=1\n'drop'=2\n'enrich'=3\n'eval'=4\n'explain'=5\n'from'=6\n'grok'=7\n'inlinestats'=8\n'keep'=9\n'limit'=10\n'mv_expand'=11\n'project'=12\n'rename'=13\n'row'=14\n'show'=15\n'sort'=16\n'stats'=17\n'where'=18\n'by'=30\n'and'=31\n'asc'=32\n'desc'=35\n'.'=36\n'false'=37\n'first'=38\n'last'=39\n'('=40\n'in'=41\n'is'=42\n'like'=43\n'not'=44\n'null'=45\n'nulls'=46\n'or'=47\n'?'=48\n'rlike'=49\n')'=50\n'true'=51\n'info'=52\n'functions'=53\n'=='=54\n'!='=55\n'<'=56\n'<='=57\n'>'=58\n'>='=59\n'+'=60\n'-'=61\n'*'=62\n'/'=63\n'%'=64\n']'=66\n'as'=72\n'metadata'=73\n'on'=74\n'with'=75\n",
metadata: {
source:
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/language_definition/esql_base_lexer.tokens',
},
},
];

/**
* Mock LangChain `Document`s from `knowledge_base/esql/example_queries`, loaded from a LangChain `DirectoryLoader`
*/
export const mockExampleQueryDocsFromDirectoryLoader: Document[] = [
{
pageContent:
'[[esql-example-queries]]\n\nThe following is an example an ES|QL query:\n\n```\nFROM logs-*\n| WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16")\n| STATS destcount = COUNT(destination.ip) by user.name, host.name\n| ENRICH ldap_lookup_new ON user.name\n| WHERE group.name IS NOT NULL\n| EVAL follow_up = CASE(\n destcount >= 100, "true",\n "false")\n| SORT destcount desc\n| KEEP destcount, host.name, user.name, group.name, follow_up\n```\n',
metadata: {
source:
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/example_queries/esql_example_query_0001.asciidoc',
},
},
{
pageContent:
'[[esql-example-queries]]\n\nThe following is an example an ES|QL query:\n\n```\nfrom logs-*\n| grok dns.question.name "%{DATA}\\\\.%{GREEDYDATA:dns.question.registered_domain:string}"\n| stats unique_queries = count_distinct(dns.question.name) by dns.question.registered_domain, process.name\n| where unique_queries > 5\n| sort unique_queries desc\n```\n',
metadata: {
source:
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/example_queries/esql_example_query_0002.asciidoc',
},
},
{
pageContent:
'[[esql-example-queries]]\n\nThe following is an example an ES|QL query:\n\n```\nfrom logs-*\n| where event.code is not null\n| stats event_code_count = count(event.code) by event.code,host.name\n| enrich win_events on event.code with EVENT_DESCRIPTION\n| where EVENT_DESCRIPTION is not null and host.name is not null\n| rename EVENT_DESCRIPTION as event.description\n| sort event_code_count desc\n| keep event_code_count,event.code,host.name,event.description\n```\n',
metadata: {
source:
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/example_queries/esql_example_query_0003.asciidoc',
},
},
];
75 changes: 75 additions & 0 deletions x-pack/plugins/elastic_assistant/server/__mocks__/msearch_query.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

import type { QueryDslTextExpansionQuery } from '@elastic/elasticsearch/lib/api/types';

import type { MsearchQueryBody } from '../lib/langchain/elasticsearch_store/helpers/get_msearch_query_body';

/**
* This mock Elasticsearch msearch request body contains two queries:
* - The first query is a similarity (vector) search
* - The second query is a required KB document (terms) search
*/
export const mSearchQueryBody: MsearchQueryBody = {
body: [
{
index: '.kibana-elastic-ai-assistant-kb',
},
{
query: {
bool: {
must_not: [
{
term: {
'metadata.kbResource': 'esql',
},
},
{
term: {
'metadata.required': true,
},
},
],
must: [
{
text_expansion: {
'vector.tokens': {
model_id: '.elser_model_2',
model_text:
'Generate an ESQL query that will count the number of connections made to external IP addresses, broken down by user. If the count is greater than 100 for a specific user, add a new field called "follow_up" that contains a value of "true", otherwise, it should contain "false". The user names should also be enriched with their respective group names.',
},
} as unknown as QueryDslTextExpansionQuery,
},
],
},
},
size: 1,
},
{
index: '.kibana-elastic-ai-assistant-kb',
},
{
query: {
bool: {
must: [
{
term: {
'metadata.kbResource': 'esql',
},
},
{
term: {
'metadata.required': true,
},
},
],
},
},
size: 1,
},
],
};
101 changes: 101 additions & 0 deletions x-pack/plugins/elastic_assistant/server/__mocks__/msearch_response.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

import type { MsearchResponse } from '@elastic/elasticsearch/lib/api/types';

/**
* This mock response from an Elasticsearch msearch contains two hits, where
* the first hit is from a similarity (vector) search, and the second hit is a
* required KB document (terms) search.
*/
export const mockMsearchResponse: MsearchResponse = {
took: 142,
responses: [
{
took: 142,
timed_out: false,
_shards: {
total: 1,
successful: 1,
skipped: 0,
failed: 0,
},
hits: {
total: {
value: 129,
relation: 'eq',
},
max_score: 21.658352,
hits: [
{
_index: '.kibana-elastic-ai-assistant-kb',
_id: 'fa1c8ba1-25c9-4404-9736-09b7eb7124f8',
_score: 21.658352,
_ignored: ['text.keyword'],
_source: {
metadata: {
source:
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/source_commands/from.asciidoc',
},
vector: {
tokens: {
wild: 1.2001507,
// truncated for mock
},
model_id: '.elser_model_2',
},
text: "[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n",
},
},
],
},
status: 200,
},
{
took: 3,
timed_out: false,
_shards: {
total: 1,
successful: 1,
skipped: 0,
failed: 0,
},
hits: {
total: {
value: 14,
relation: 'eq',
},
max_score: 0.034783483,
hits: [
{
_index: '.kibana-elastic-ai-assistant-kb',
_id: '280d4882-0f64-4471-a268-669a3f8c958f',
_score: 0.034783483,
_ignored: ['text.keyword'],
_source: {
metadata: {
source:
'/Users/andrew.goldstein/Projects/forks/andrew-goldstein/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/example_queries/esql_example_query_0001.asciidoc',
required: true,
kbResource: 'esql',
},
vector: {
tokens: {
user: 1.1084619,
// truncated for mock
},
model_id: '.elser_model_2',
},
text: '[[esql-example-queries]]\n\nThe following is an example an ES|QL query:\n\n```\nFROM logs-*\n| WHERE NOT CIDR_MATCH(destination.ip, "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16")\n| STATS destcount = COUNT(destination.ip) by user.name, host.name\n| ENRICH ldap_lookup_new ON user.name\n| WHERE group.name IS NOT NULL\n| EVAL follow_up = CASE(\n destcount >= 100, "true",\n "false")\n| SORT destcount desc\n| KEEP destcount, host.name, user.name, group.name, follow_up\n```\n',
},
},
],
},
status: 200,
},
],
};
28 changes: 28 additions & 0 deletions x-pack/plugins/elastic_assistant/server/__mocks__/query_text.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

/**
* This mock query text is an example of a prompt that might be passed to
* the `ElasticSearchStore`'s `similaritySearch` function, as the `query`
* parameter.
*
* In the real world, an LLM extracted the `mockQueryText` from the
* following prompt, which includes a system prompt:
*
* ```
* You are a helpful, expert assistant who answers questions about Elastic Security. Do not answer questions unrelated to Elastic Security.
* If you answer a question related to KQL, EQL, or ES|QL, it should be immediately usable within an Elastic Security timeline; please always format the output correctly with back ticks. Any answer provided for Query DSL should also be usable in a security timeline. This means you should only ever include the "filter" portion of the query.
*
* Use the following context to answer questions:
*
* Generate an ES|QL query that will count the number of connections made to external IP addresses, broken down by user. If the count is greater than 100 for a specific user, add a new field called "follow_up" that contains a value of "true", otherwise, it should contain "false". The user names should also be enriched with their respective group names.
* ```
*
* In the example above, the LLM omitted the system prompt, such that only `mockQueryText` is passed to the `similaritySearch` function.
*/
export const mockQueryText =
'Generate an ES|QL query that will count the number of connections made to external IP addresses, broken down by user. If the count is greater than 100 for a specific user, add a new field called follow_up that contains a value of true, otherwise, it should contain false. The user names should also be enriched with their respective group names.';
28 changes: 28 additions & 0 deletions x-pack/plugins/elastic_assistant/server/__mocks__/terms.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

import type { Field, FieldValue, QueryDslTermQuery } from '@elastic/elasticsearch/lib/api/types';

/**
* These (mock) terms may be used in multiple queries.
*
* For example, it may be be used in a vector search to exclude the required `esql` KB docs.
*
* It may also be used in a terms search to find all of the required `esql` KB docs.
*/
export const mockTerms: Array<Partial<Record<Field, QueryDslTermQuery | FieldValue>>> = [
{
term: {
'metadata.kbResource': 'esql',
},
},
{
term: {
'metadata.required': true,
},
},
];
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

import type { QueryDslQueryContainer } from '@elastic/elasticsearch/lib/api/types';

/**
* This Elasticsearch query DSL is a terms search for required `esql` KB docs
*/
export const mockTermsSearchQuery: QueryDslQueryContainer = {
bool: {
must: [
{
term: {
'metadata.kbResource': 'esql',
},
},
{
term: {
'metadata.required': true,
},
},
],
},
};
Loading

0 comments on commit 3cde407

Please sign in to comment.