-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Entity Analytics] Add Field Retention Enrich Policy and Ingest Pipeline to Entity Engine #193848
[Entity Analytics] Add Field Retention Enrich Policy and Ingest Pipeline to Entity Engine #193848
Conversation
28c542b
to
a27f5a1
Compare
3038ecf
to
2bed7df
Compare
Maybe a gist with the script? |
5d1b4e0
to
7863b6e
Compare
...ugins/security_solution/server/lib/entity_analytics/entity_store/entity_store_data_client.ts
Outdated
Show resolved
Hide resolved
...ugins/security_solution/server/lib/entity_analytics/entity_store/entity_store_data_client.ts
Outdated
Show resolved
Hide resolved
c85dd48
to
f12e009
Compare
...on/server/lib/entity_analytics/entity_store/united_entity_definitions/entity_types/common.ts
Outdated
Show resolved
Hide resolved
…:hop-dev/kibana into entity-store-enrich-processor-for-rebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for answering my question @hop-dev , lgtm!
…:hop-dev/kibana into entity-store-enrich-processor-for-rebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have it ✅
@elasticmachine merge upstream |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
💛 Build succeeded, but was flaky
Failed CI StepsTest FailuresMetrics [docs]Async chunks
Unknown metric groupsESLint disabled line counts
Total ESLint disabled count
History
cc @hop-dev |
const unitedDefinition = getUnitedEntityDefinition({ | ||
namespace, | ||
entityType, | ||
fieldHistoryLength: 10, // we are not using this value so it can be anything |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused. So why do we have it? ❓ 😕 ❓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make it optional?
Starting backport for target branches: 8.x |
…ine to Entity Engine (elastic#193848) ## Summary Add the "Ouroboros" part of the entity engine: - an enrich policy is created for each engine - the enrich policy is executed every 30s by a kibana task, this will be 1h once we move to a 24h lookback - create an ingest pipeline for the latest which performs the specified field retention operations (for more detail see below) <img width="2112" alt="Screenshot 2024-10-02 at 13 42 11" src="https://github.com/user-attachments/assets/f727607f-2e0a-4056-a51e-393fb2a97a95"> <details> <summary> Expand for example host entity </summary> ``` { "@timestamp": "2024-10-01T12:10:46.000Z", "host": { "name": "host9", "hostname": [ "host9" ], "domain": [ "test.com" ], "ip": [ "1.1.1.1", "1.1.1.2", "1.1.1.3" ], "risk": { "calculated_score": "70.0", "calculated_score_norm": "27.00200653076172", "calculated_level": "Low" }, "id": [ "1234567890abcdef" ], "type": [ "server" ], "mac": [ "AA:AA:AA:AA:AA:AB", "aa:aa:aa:aa:aa:aa", "AA:AA:AA:AA:AA:AC" ], "architecture": [ "x86_64" ] }, "asset": { "criticality": "low_impact" }, "entity": { "name": "host9", "id": "kP/jiFHWSwWlO7W0+fGWrg==", "source": [ "risk-score.risk-score-latest-default", ".asset-criticality.asset-criticality-default", ".ds-logs-testlogs1-default-2024.10.01-000001", ".ds-logs-testlogs2-default-2024.10.01-000001", ".ds-logs-testlogs3-default-2024.10.01-000001" ], "type": "host" } } ``` </details> ### Field retention operators First some terminology: - **latest value** - the value produced by the transform which represents the latest vioew of a given field in the transform lookback period - **enrich value** - the value added to the document by the enrich policy, this represents the last value of a field outiside of the transform lookback window We hope that this will one day be merged into the entity manager framework so I've tried to abstract this as much as possible. A field retention operator specifies how we should choose a value for a field when looking at the latest value and the enrich value. ### Collect values Collect unique values in an array, first taking from the latest values and then filling with enrich values up to maxLength. ``` { operation: 'collect_values', field: 'host.ip', maxLength: 10 } ``` ### Prefer newest value Choose the latest value if present, otherwise choose the enrich value. ``` { operation: 'prefer_newest_value', field: 'asset.criticality' } ``` ### Prefer oldest value Choose the enrich value if it is present, otherwise choose latest. ``` { operation: 'prefer_oldest_value', field: 'first_seen_timestamp' } ``` ## Test instructions We currently require extra permissions for the kibana system user for this to work, so we must ### 1. Get Elasticsearch running from source This prototype requires a custom branch of elasticsearch in order to give the kibana system user more privileges. #### Step 1 - Clone the prototype branch The elasticsearch branch is at https://github.com/elastic/elasticsearch/tree/entity-store-permissions. Or you can use [github command line](https://cli.github.com/) to checkout my draft PR: ``` gh pr checkout 113942 ``` #### Step 2 - Install Java Install [homebrew](https://brew.sh/) if you do not have it. ``` brew install openjdk@21 sudo ln -sfn /opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-21.jdk ``` #### Step 3 - Run elasticsearch This makes sure your data stays between runs of elasticsearch, and that you have platinum license features ``` ./gradlew run --data-dir /tmp/elasticsearch-repo --preserve-data -Drun.license_type=trial ``` ### 2. Get Kibana Running #### Step 1 - Connect kibana to elasticsearch Set this in your kibana config: ``` elasticsearch.username: elastic-admin elasticsearch.password: elastic-password ``` Now start kibana and you should have connected to the elasticsearch you made. ### 3. Initialise entity engine and send data! - Initialise the host or user engine (or both) ``` curl -H 'Content-Type: application/json' \ -X POST \ -H 'kbn-xsrf: true' \ -H 'elastic-api-version: 2023-10-31' \ -d '{}' \ http:///elastic:changeme@localhost:5601/api/entity_store/engines/host/init ``` - use your favourite data generation tool to create data, maybe https://github.com/elastic/security-documents-generator --------- Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: kibanamachine <[email protected]> (cherry picked from commit 5131215)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
… Pipeline to Entity Engine (#193848) (#195929) # Backport This will backport the following commits from `main` to `8.x`: - [[Entity Analytics] Add Field Retention Enrich Policy and Ingest Pipeline to Entity Engine (#193848)](#193848) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Mark Hopkin","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-10-11T14:04:49Z","message":"[Entity Analytics] Add Field Retention Enrich Policy and Ingest Pipeline to Entity Engine (#193848)\n\n## Summary\r\n\r\nAdd the \"Ouroboros\" part of the entity engine:\r\n\r\n- an enrich policy is created for each engine\r\n- the enrich policy is executed every 30s by a kibana task, this will be\r\n1h once we move to a 24h lookback\r\n- create an ingest pipeline for the latest which performs the specified\r\nfield retention operations (for more detail see below)\r\n\r\n<img width=\"2112\" alt=\"Screenshot 2024-10-02 at 13 42 11\"\r\nsrc=\"https://github.com/user-attachments/assets/f727607f-2e0a-4056-a51e-393fb2a97a95\">\r\n\r\n<details>\r\n<summary> Expand for example host entity </summary>\r\n```\r\n{\r\n \"@timestamp\": \"2024-10-01T12:10:46.000Z\",\r\n \"host\": {\r\n \"name\": \"host9\",\r\n \"hostname\": [\r\n \"host9\"\r\n ],\r\n \"domain\": [\r\n \"test.com\"\r\n ],\r\n \"ip\": [\r\n \"1.1.1.1\",\r\n \"1.1.1.2\",\r\n \"1.1.1.3\"\r\n ],\r\n \"risk\": {\r\n \"calculated_score\": \"70.0\",\r\n \"calculated_score_norm\": \"27.00200653076172\",\r\n \"calculated_level\": \"Low\"\r\n },\r\n \"id\": [\r\n \"1234567890abcdef\"\r\n ],\r\n \"type\": [\r\n \"server\"\r\n ],\r\n \"mac\": [\r\n \"AA:AA:AA:AA:AA:AB\",\r\n \"aa:aa:aa:aa:aa:aa\",\r\n \"AA:AA:AA:AA:AA:AC\"\r\n ],\r\n \"architecture\": [\r\n \"x86_64\"\r\n ]\r\n },\r\n \"asset\": {\r\n \"criticality\": \"low_impact\"\r\n },\r\n \"entity\": {\r\n \"name\": \"host9\",\r\n \"id\": \"kP/jiFHWSwWlO7W0+fGWrg==\",\r\n \"source\": [\r\n \"risk-score.risk-score-latest-default\",\r\n \".asset-criticality.asset-criticality-default\",\r\n \".ds-logs-testlogs1-default-2024.10.01-000001\",\r\n \".ds-logs-testlogs2-default-2024.10.01-000001\",\r\n \".ds-logs-testlogs3-default-2024.10.01-000001\"\r\n ],\r\n \"type\": \"host\"\r\n }\r\n}\r\n```\r\n</details>\r\n\r\n### Field retention operators\r\n\r\nFirst some terminology:\r\n\r\n- **latest value** - the value produced by the transform which\r\nrepresents the latest vioew of a given field in the transform lookback\r\nperiod\r\n- **enrich value** - the value added to the document by the enrich\r\npolicy, this represents the last value of a field outiside of the\r\ntransform lookback window\r\n\r\nWe hope that this will one day be merged into the entity manager\r\nframework so I've tried to abstract this as much as possible. A field\r\nretention operator specifies how we should choose a value for a field\r\nwhen looking at the latest value and the enrich value.\r\n\r\n### Collect values\r\nCollect unique values in an array, first taking from the latest values\r\nand then filling with enrich values up to maxLength.\r\n\r\n```\r\n{\r\n operation: 'collect_values',\r\n field: 'host.ip',\r\n maxLength: 10\r\n}\r\n```\r\n\r\n### Prefer newest value\r\nChoose the latest value if present, otherwise choose the enrich value.\r\n\r\n```\r\n{\r\n operation: 'prefer_newest_value',\r\n field: 'asset.criticality'\r\n}\r\n```\r\n\r\n### Prefer oldest value\r\nChoose the enrich value if it is present, otherwise choose latest.\r\n```\r\n{\r\n operation: 'prefer_oldest_value',\r\n field: 'first_seen_timestamp'\r\n}\r\n```\r\n\r\n## Test instructions\r\n\r\nWe currently require extra permissions for the kibana system user for\r\nthis to work, so we must\r\n\r\n### 1. Get Elasticsearch running from source\r\nThis prototype requires a custom branch of elasticsearch in order to\r\ngive the kibana system user more privileges.\r\n\r\n#### Step 1 - Clone the prototype branch\r\nThe elasticsearch branch is at\r\nhttps://github.com/elastic/elasticsearch/tree/entity-store-permissions.\r\n\r\nOr you can use [github command line](https://cli.github.com/) to\r\ncheckout my draft PR:\r\n```\r\ngh pr checkout 113942\r\n```\r\n#### Step 2 - Install Java\r\nInstall [homebrew](https://brew.sh/) if you do not have it.\r\n\r\n```\r\nbrew install openjdk@21\r\nsudo ln -sfn /opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-21.jdk\r\n```\r\n\r\n#### Step 3 - Run elasticsearch\r\nThis makes sure your data stays between runs of elasticsearch, and that\r\nyou have platinum license features\r\n\r\n```\r\n./gradlew run --data-dir /tmp/elasticsearch-repo --preserve-data -Drun.license_type=trial\r\n```\r\n\r\n### 2. Get Kibana Running\r\n\r\n#### Step 1 - Connect kibana to elasticsearch\r\n\r\nSet this in your kibana config:\r\n\r\n```\r\nelasticsearch.username: elastic-admin\r\nelasticsearch.password: elastic-password\r\n```\r\nNow start kibana and you should have connected to the elasticsearch you\r\nmade.\r\n\r\n### 3. Initialise entity engine and send data!\r\n\r\n- Initialise the host or user engine (or both)\r\n\r\n```\r\ncurl -H 'Content-Type: application/json' \\\r\n -X POST \\ \r\n -H 'kbn-xsrf: true' \\\r\n -H 'elastic-api-version: 2023-10-31' \\\r\n -d '{}' \\\r\n http:///elastic:changeme@localhost:5601/api/entity_store/engines/host/init \r\n```\r\n\r\n- use your favourite data generation tool to create data, maybe\r\nhttps://github.com/elastic/security-documents-generator\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <[email protected]>\r\nCo-authored-by: kibanamachine <[email protected]>","sha":"51312159b0436841e0364d7aac0056757962907c","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Team: SecuritySolution","backport:prev-minor","Team:Entity Analytics"],"title":"[Entity Analytics] Add Field Retention Enrich Policy and Ingest Pipeline to Entity Engine","number":193848,"url":"https://github.com/elastic/kibana/pull/193848","mergeCommit":{"message":"[Entity Analytics] Add Field Retention Enrich Policy and Ingest Pipeline to Entity Engine (#193848)\n\n## Summary\r\n\r\nAdd the \"Ouroboros\" part of the entity engine:\r\n\r\n- an enrich policy is created for each engine\r\n- the enrich policy is executed every 30s by a kibana task, this will be\r\n1h once we move to a 24h lookback\r\n- create an ingest pipeline for the latest which performs the specified\r\nfield retention operations (for more detail see below)\r\n\r\n<img width=\"2112\" alt=\"Screenshot 2024-10-02 at 13 42 11\"\r\nsrc=\"https://github.com/user-attachments/assets/f727607f-2e0a-4056-a51e-393fb2a97a95\">\r\n\r\n<details>\r\n<summary> Expand for example host entity </summary>\r\n```\r\n{\r\n \"@timestamp\": \"2024-10-01T12:10:46.000Z\",\r\n \"host\": {\r\n \"name\": \"host9\",\r\n \"hostname\": [\r\n \"host9\"\r\n ],\r\n \"domain\": [\r\n \"test.com\"\r\n ],\r\n \"ip\": [\r\n \"1.1.1.1\",\r\n \"1.1.1.2\",\r\n \"1.1.1.3\"\r\n ],\r\n \"risk\": {\r\n \"calculated_score\": \"70.0\",\r\n \"calculated_score_norm\": \"27.00200653076172\",\r\n \"calculated_level\": \"Low\"\r\n },\r\n \"id\": [\r\n \"1234567890abcdef\"\r\n ],\r\n \"type\": [\r\n \"server\"\r\n ],\r\n \"mac\": [\r\n \"AA:AA:AA:AA:AA:AB\",\r\n \"aa:aa:aa:aa:aa:aa\",\r\n \"AA:AA:AA:AA:AA:AC\"\r\n ],\r\n \"architecture\": [\r\n \"x86_64\"\r\n ]\r\n },\r\n \"asset\": {\r\n \"criticality\": \"low_impact\"\r\n },\r\n \"entity\": {\r\n \"name\": \"host9\",\r\n \"id\": \"kP/jiFHWSwWlO7W0+fGWrg==\",\r\n \"source\": [\r\n \"risk-score.risk-score-latest-default\",\r\n \".asset-criticality.asset-criticality-default\",\r\n \".ds-logs-testlogs1-default-2024.10.01-000001\",\r\n \".ds-logs-testlogs2-default-2024.10.01-000001\",\r\n \".ds-logs-testlogs3-default-2024.10.01-000001\"\r\n ],\r\n \"type\": \"host\"\r\n }\r\n}\r\n```\r\n</details>\r\n\r\n### Field retention operators\r\n\r\nFirst some terminology:\r\n\r\n- **latest value** - the value produced by the transform which\r\nrepresents the latest vioew of a given field in the transform lookback\r\nperiod\r\n- **enrich value** - the value added to the document by the enrich\r\npolicy, this represents the last value of a field outiside of the\r\ntransform lookback window\r\n\r\nWe hope that this will one day be merged into the entity manager\r\nframework so I've tried to abstract this as much as possible. A field\r\nretention operator specifies how we should choose a value for a field\r\nwhen looking at the latest value and the enrich value.\r\n\r\n### Collect values\r\nCollect unique values in an array, first taking from the latest values\r\nand then filling with enrich values up to maxLength.\r\n\r\n```\r\n{\r\n operation: 'collect_values',\r\n field: 'host.ip',\r\n maxLength: 10\r\n}\r\n```\r\n\r\n### Prefer newest value\r\nChoose the latest value if present, otherwise choose the enrich value.\r\n\r\n```\r\n{\r\n operation: 'prefer_newest_value',\r\n field: 'asset.criticality'\r\n}\r\n```\r\n\r\n### Prefer oldest value\r\nChoose the enrich value if it is present, otherwise choose latest.\r\n```\r\n{\r\n operation: 'prefer_oldest_value',\r\n field: 'first_seen_timestamp'\r\n}\r\n```\r\n\r\n## Test instructions\r\n\r\nWe currently require extra permissions for the kibana system user for\r\nthis to work, so we must\r\n\r\n### 1. Get Elasticsearch running from source\r\nThis prototype requires a custom branch of elasticsearch in order to\r\ngive the kibana system user more privileges.\r\n\r\n#### Step 1 - Clone the prototype branch\r\nThe elasticsearch branch is at\r\nhttps://github.com/elastic/elasticsearch/tree/entity-store-permissions.\r\n\r\nOr you can use [github command line](https://cli.github.com/) to\r\ncheckout my draft PR:\r\n```\r\ngh pr checkout 113942\r\n```\r\n#### Step 2 - Install Java\r\nInstall [homebrew](https://brew.sh/) if you do not have it.\r\n\r\n```\r\nbrew install openjdk@21\r\nsudo ln -sfn /opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-21.jdk\r\n```\r\n\r\n#### Step 3 - Run elasticsearch\r\nThis makes sure your data stays between runs of elasticsearch, and that\r\nyou have platinum license features\r\n\r\n```\r\n./gradlew run --data-dir /tmp/elasticsearch-repo --preserve-data -Drun.license_type=trial\r\n```\r\n\r\n### 2. Get Kibana Running\r\n\r\n#### Step 1 - Connect kibana to elasticsearch\r\n\r\nSet this in your kibana config:\r\n\r\n```\r\nelasticsearch.username: elastic-admin\r\nelasticsearch.password: elastic-password\r\n```\r\nNow start kibana and you should have connected to the elasticsearch you\r\nmade.\r\n\r\n### 3. Initialise entity engine and send data!\r\n\r\n- Initialise the host or user engine (or both)\r\n\r\n```\r\ncurl -H 'Content-Type: application/json' \\\r\n -X POST \\ \r\n -H 'kbn-xsrf: true' \\\r\n -H 'elastic-api-version: 2023-10-31' \\\r\n -d '{}' \\\r\n http:///elastic:changeme@localhost:5601/api/entity_store/engines/host/init \r\n```\r\n\r\n- use your favourite data generation tool to create data, maybe\r\nhttps://github.com/elastic/security-documents-generator\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <[email protected]>\r\nCo-authored-by: kibanamachine <[email protected]>","sha":"51312159b0436841e0364d7aac0056757962907c"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/193848","number":193848,"mergeCommit":{"message":"[Entity Analytics] Add Field Retention Enrich Policy and Ingest Pipeline to Entity Engine (#193848)\n\n## Summary\r\n\r\nAdd the \"Ouroboros\" part of the entity engine:\r\n\r\n- an enrich policy is created for each engine\r\n- the enrich policy is executed every 30s by a kibana task, this will be\r\n1h once we move to a 24h lookback\r\n- create an ingest pipeline for the latest which performs the specified\r\nfield retention operations (for more detail see below)\r\n\r\n<img width=\"2112\" alt=\"Screenshot 2024-10-02 at 13 42 11\"\r\nsrc=\"https://github.com/user-attachments/assets/f727607f-2e0a-4056-a51e-393fb2a97a95\">\r\n\r\n<details>\r\n<summary> Expand for example host entity </summary>\r\n```\r\n{\r\n \"@timestamp\": \"2024-10-01T12:10:46.000Z\",\r\n \"host\": {\r\n \"name\": \"host9\",\r\n \"hostname\": [\r\n \"host9\"\r\n ],\r\n \"domain\": [\r\n \"test.com\"\r\n ],\r\n \"ip\": [\r\n \"1.1.1.1\",\r\n \"1.1.1.2\",\r\n \"1.1.1.3\"\r\n ],\r\n \"risk\": {\r\n \"calculated_score\": \"70.0\",\r\n \"calculated_score_norm\": \"27.00200653076172\",\r\n \"calculated_level\": \"Low\"\r\n },\r\n \"id\": [\r\n \"1234567890abcdef\"\r\n ],\r\n \"type\": [\r\n \"server\"\r\n ],\r\n \"mac\": [\r\n \"AA:AA:AA:AA:AA:AB\",\r\n \"aa:aa:aa:aa:aa:aa\",\r\n \"AA:AA:AA:AA:AA:AC\"\r\n ],\r\n \"architecture\": [\r\n \"x86_64\"\r\n ]\r\n },\r\n \"asset\": {\r\n \"criticality\": \"low_impact\"\r\n },\r\n \"entity\": {\r\n \"name\": \"host9\",\r\n \"id\": \"kP/jiFHWSwWlO7W0+fGWrg==\",\r\n \"source\": [\r\n \"risk-score.risk-score-latest-default\",\r\n \".asset-criticality.asset-criticality-default\",\r\n \".ds-logs-testlogs1-default-2024.10.01-000001\",\r\n \".ds-logs-testlogs2-default-2024.10.01-000001\",\r\n \".ds-logs-testlogs3-default-2024.10.01-000001\"\r\n ],\r\n \"type\": \"host\"\r\n }\r\n}\r\n```\r\n</details>\r\n\r\n### Field retention operators\r\n\r\nFirst some terminology:\r\n\r\n- **latest value** - the value produced by the transform which\r\nrepresents the latest vioew of a given field in the transform lookback\r\nperiod\r\n- **enrich value** - the value added to the document by the enrich\r\npolicy, this represents the last value of a field outiside of the\r\ntransform lookback window\r\n\r\nWe hope that this will one day be merged into the entity manager\r\nframework so I've tried to abstract this as much as possible. A field\r\nretention operator specifies how we should choose a value for a field\r\nwhen looking at the latest value and the enrich value.\r\n\r\n### Collect values\r\nCollect unique values in an array, first taking from the latest values\r\nand then filling with enrich values up to maxLength.\r\n\r\n```\r\n{\r\n operation: 'collect_values',\r\n field: 'host.ip',\r\n maxLength: 10\r\n}\r\n```\r\n\r\n### Prefer newest value\r\nChoose the latest value if present, otherwise choose the enrich value.\r\n\r\n```\r\n{\r\n operation: 'prefer_newest_value',\r\n field: 'asset.criticality'\r\n}\r\n```\r\n\r\n### Prefer oldest value\r\nChoose the enrich value if it is present, otherwise choose latest.\r\n```\r\n{\r\n operation: 'prefer_oldest_value',\r\n field: 'first_seen_timestamp'\r\n}\r\n```\r\n\r\n## Test instructions\r\n\r\nWe currently require extra permissions for the kibana system user for\r\nthis to work, so we must\r\n\r\n### 1. Get Elasticsearch running from source\r\nThis prototype requires a custom branch of elasticsearch in order to\r\ngive the kibana system user more privileges.\r\n\r\n#### Step 1 - Clone the prototype branch\r\nThe elasticsearch branch is at\r\nhttps://github.com/elastic/elasticsearch/tree/entity-store-permissions.\r\n\r\nOr you can use [github command line](https://cli.github.com/) to\r\ncheckout my draft PR:\r\n```\r\ngh pr checkout 113942\r\n```\r\n#### Step 2 - Install Java\r\nInstall [homebrew](https://brew.sh/) if you do not have it.\r\n\r\n```\r\nbrew install openjdk@21\r\nsudo ln -sfn /opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-21.jdk\r\n```\r\n\r\n#### Step 3 - Run elasticsearch\r\nThis makes sure your data stays between runs of elasticsearch, and that\r\nyou have platinum license features\r\n\r\n```\r\n./gradlew run --data-dir /tmp/elasticsearch-repo --preserve-data -Drun.license_type=trial\r\n```\r\n\r\n### 2. Get Kibana Running\r\n\r\n#### Step 1 - Connect kibana to elasticsearch\r\n\r\nSet this in your kibana config:\r\n\r\n```\r\nelasticsearch.username: elastic-admin\r\nelasticsearch.password: elastic-password\r\n```\r\nNow start kibana and you should have connected to the elasticsearch you\r\nmade.\r\n\r\n### 3. Initialise entity engine and send data!\r\n\r\n- Initialise the host or user engine (or both)\r\n\r\n```\r\ncurl -H 'Content-Type: application/json' \\\r\n -X POST \\ \r\n -H 'kbn-xsrf: true' \\\r\n -H 'elastic-api-version: 2023-10-31' \\\r\n -d '{}' \\\r\n http:///elastic:changeme@localhost:5601/api/entity_store/engines/host/init \r\n```\r\n\r\n- use your favourite data generation tool to create data, maybe\r\nhttps://github.com/elastic/security-documents-generator\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <[email protected]>\r\nCo-authored-by: kibanamachine <[email protected]>","sha":"51312159b0436841e0364d7aac0056757962907c"}}]}] BACKPORT--> Co-authored-by: Mark Hopkin <[email protected]>
Summary
Add the "Ouroboros" part of the entity engine:
Expand for example host entity
``` { "@timestamp": "2024-10-01T12:10:46.000Z", "host": { "name": "host9", "hostname": [ "host9" ], "domain": [ "test.com" ], "ip": [ "1.1.1.1", "1.1.1.2", "1.1.1.3" ], "risk": { "calculated_score": "70.0", "calculated_score_norm": "27.00200653076172", "calculated_level": "Low" }, "id": [ "1234567890abcdef" ], "type": [ "server" ], "mac": [ "AA:AA:AA:AA:AA:AB", "aa:aa:aa:aa:aa:aa", "AA:AA:AA:AA:AA:AC" ], "architecture": [ "x86_64" ] }, "asset": { "criticality": "low_impact" }, "entity": { "name": "host9", "id": "kP/jiFHWSwWlO7W0+fGWrg==", "source": [ "risk-score.risk-score-latest-default", ".asset-criticality.asset-criticality-default", ".ds-logs-testlogs1-default-2024.10.01-000001", ".ds-logs-testlogs2-default-2024.10.01-000001", ".ds-logs-testlogs3-default-2024.10.01-000001" ], "type": "host" } } ```Field retention operators
First some terminology:
We hope that this will one day be merged into the entity manager framework so I've tried to abstract this as much as possible. A field retention operator specifies how we should choose a value for a field when looking at the latest value and the enrich value.
Collect values
Collect unique values in an array, first taking from the latest values and then filling with enrich values up to maxLength.
Prefer newest value
Choose the latest value if present, otherwise choose the enrich value.
Prefer oldest value
Choose the enrich value if it is present, otherwise choose latest.
Test instructions
We currently require extra permissions for the kibana system user for this to work, so we must
1. Get Elasticsearch running from source
This prototype requires a custom branch of elasticsearch in order to give the kibana system user more privileges.
Step 1 - Clone the prototype branch
The elasticsearch branch is at https://github.com/elastic/elasticsearch/tree/entity-store-permissions.
Or you can use github command line to checkout my draft PR:
Step 2 - Install Java
Install homebrew if you do not have it.
Step 3 - Run elasticsearch
This makes sure your data stays between runs of elasticsearch, and that you have platinum license features
2. Get Kibana Running
Step 1 - Connect kibana to elasticsearch
Set this in your kibana config:
Now start kibana and you should have connected to the elasticsearch you made.
3. Initialise entity engine and send data!