forked from elastic/kibana
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Rule Migration] Add RAG for prebuilt rules and new Retrievers (elast…
…ic#202796) ## Summary Graph changes: ![image](https://github.com/user-attachments/assets/54ad563b-9023-4e46-a80c-73ba6b61cf70) This PR focuses on adding the functionality to retrieve currrently available prebuilt rules and create a new index with semantic_text mappings to allow the SIEM migration process to use it for RAG usecases. The below changes are some specific mentions that the PR changes: - Move the creation of the RAG indicies from `/create` to `/start`, also removes the `await` for `prepare` when `/start` is called. - Move all retrievers to a new `retriever` folder, together with a new `RuleMigrationsRetriever` class to encapsulate all the different retrievers at one place. - Adds timeout to integration and prebuilt rule bulk requests to ES because of the possible time it can take to generate initial embeddings. - Move some nodes from Translate Rule subgraph to the main agent graph, as semantic queries are used now for both translate and matching prebuilt.
- Loading branch information
Showing
38 changed files
with
420 additions
and
339 deletions.
There are no files selected for viewing
Binary file modified
BIN
-6.49 KB
(84%)
x-pack/plugins/security_solution/docs/siem_migration/img/agent_graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
137 changes: 137 additions & 0 deletions
137
...ution/server/lib/siem_migrations/rules/data/rule_migrations_data_prebuilt_rules_client.ts
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License | ||
* 2.0; you may not use this file except in compliance with the Elastic License | ||
* 2.0. | ||
*/ | ||
|
||
import type { RulesClient } from '@kbn/alerting-plugin/server'; | ||
import type { SavedObjectsClientContract } from '@kbn/core-saved-objects-api-server'; | ||
import { createPrebuiltRuleAssetsClient } from '../../../detection_engine/prebuilt_rules/logic/rule_assets/prebuilt_rule_assets_client'; | ||
import { createPrebuiltRuleObjectsClient } from '../../../detection_engine/prebuilt_rules/logic/rule_objects/prebuilt_rule_objects_client'; | ||
import { fetchRuleVersionsTriad } from '../../../detection_engine/prebuilt_rules/logic/rule_versions/fetch_rule_versions_triad'; | ||
import type { RuleMigrationPrebuiltRule } from '../types'; | ||
import { RuleMigrationsDataBaseClient } from './rule_migrations_data_base_client'; | ||
|
||
interface RetrievePrebuiltRulesParams { | ||
soClient: SavedObjectsClientContract; | ||
rulesClient: RulesClient; | ||
} | ||
|
||
/* The minimum score required for a integration to be considered correct, might need to change this later */ | ||
const MIN_SCORE = 40 as const; | ||
/* The number of integrations the RAG will return, sorted by score */ | ||
const RETURNED_RULES = 5 as const; | ||
|
||
/* BULK_MAX_SIZE defines the number to break down the bulk operations by. | ||
* The 500 number was chosen as a reasonable number to avoid large payloads. It can be adjusted if needed. | ||
*/ | ||
const BULK_MAX_SIZE = 500 as const; | ||
|
||
export class RuleMigrationsDataPrebuiltRulesClient extends RuleMigrationsDataBaseClient { | ||
/** Indexes an array of integrations to be used with ELSER semantic search queries */ | ||
async create({ soClient, rulesClient }: RetrievePrebuiltRulesParams): Promise<void> { | ||
const ruleAssetsClient = createPrebuiltRuleAssetsClient(soClient); | ||
const ruleObjectsClient = createPrebuiltRuleObjectsClient(rulesClient); | ||
|
||
const ruleVersionsMap = await fetchRuleVersionsTriad({ | ||
ruleAssetsClient, | ||
ruleObjectsClient, | ||
}); | ||
|
||
const filteredRules: RuleMigrationPrebuiltRule[] = []; | ||
ruleVersionsMap.forEach((ruleVersions) => { | ||
const rule = ruleVersions.target || ruleVersions.current; | ||
if (rule) { | ||
const mitreAttackIds = rule?.threat?.flatMap( | ||
({ technique }) => technique?.map(({ id }) => id) ?? [] | ||
); | ||
|
||
filteredRules.push({ | ||
rule_id: rule.rule_id, | ||
name: rule.name, | ||
installedRuleId: ruleVersions.current?.id, | ||
description: rule.description, | ||
elser_embedding: `${rule.name} - ${rule.description}`, | ||
...(mitreAttackIds?.length && { mitre_attack_ids: mitreAttackIds }), | ||
}); | ||
} | ||
}); | ||
|
||
const index = await this.getIndexName(); | ||
const createdAt = new Date().toISOString(); | ||
let prebuiltRuleSlice: RuleMigrationPrebuiltRule[]; | ||
while ((prebuiltRuleSlice = filteredRules.splice(0, BULK_MAX_SIZE)).length) { | ||
await this.esClient | ||
.bulk( | ||
{ | ||
refresh: 'wait_for', | ||
operations: prebuiltRuleSlice.flatMap((prebuiltRule) => [ | ||
{ update: { _index: index, _id: prebuiltRule.rule_id } }, | ||
{ | ||
doc: { | ||
...prebuiltRule, | ||
'@timestamp': createdAt, | ||
}, | ||
doc_as_upsert: true, | ||
}, | ||
]), | ||
}, | ||
{ requestTimeout: 10 * 60 * 1000 } | ||
) | ||
.catch((error) => { | ||
this.logger.error(`Error preparing prebuilt rules for SIEM migration: ${error.message}`); | ||
throw error; | ||
}); | ||
} | ||
} | ||
|
||
/** Based on a LLM generated semantic string, returns the 5 best results with a score above 40 */ | ||
async retrieveRules( | ||
semanticString: string, | ||
techniqueIds: string | ||
): Promise<RuleMigrationPrebuiltRule[]> { | ||
const index = await this.getIndexName(); | ||
const query = { | ||
bool: { | ||
should: [ | ||
{ | ||
semantic: { | ||
query: semanticString, | ||
field: 'elser_embedding', | ||
boost: 1.5, | ||
}, | ||
}, | ||
{ | ||
multi_match: { | ||
query: semanticString, | ||
fields: ['name^2', 'description'], | ||
boost: 3, | ||
}, | ||
}, | ||
{ | ||
multi_match: { | ||
query: techniqueIds, | ||
fields: ['mitre_attack_ids'], | ||
boost: 2, | ||
}, | ||
}, | ||
], | ||
}, | ||
}; | ||
const results = await this.esClient | ||
.search<RuleMigrationPrebuiltRule>({ | ||
index, | ||
query, | ||
size: RETURNED_RULES, | ||
min_score: MIN_SCORE, | ||
}) | ||
.then(this.processResponseHits.bind(this)) | ||
.catch((error) => { | ||
this.logger.error(`Error querying prebuilt rule details for ELSER: ${error.message}`); | ||
throw error; | ||
}); | ||
|
||
return results; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.