Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Entity Analytics] Add Field Retention Enrich Policy and Ingest Pipeline to Entity Engine #193848

Merged
merged 95 commits into from
Oct 11, 2024

Conversation

hop-dev
Copy link
Contributor

@hop-dev hop-dev commented Sep 24, 2024

Summary

Add the "Ouroboros" part of the entity engine:

  • an enrich policy is created for each engine
  • the enrich policy is executed every 30s by a kibana task, this will be 1h once we move to a 24h lookback
  • create an ingest pipeline for the latest which performs the specified field retention operations (for more detail see below)
Screenshot 2024-10-02 at 13 42 11
Expand for example host entity ``` { "@timestamp": "2024-10-01T12:10:46.000Z", "host": { "name": "host9", "hostname": [ "host9" ], "domain": [ "test.com" ], "ip": [ "1.1.1.1", "1.1.1.2", "1.1.1.3" ], "risk": { "calculated_score": "70.0", "calculated_score_norm": "27.00200653076172", "calculated_level": "Low" }, "id": [ "1234567890abcdef" ], "type": [ "server" ], "mac": [ "AA:AA:AA:AA:AA:AB", "aa:aa:aa:aa:aa:aa", "AA:AA:AA:AA:AA:AC" ], "architecture": [ "x86_64" ] }, "asset": { "criticality": "low_impact" }, "entity": { "name": "host9", "id": "kP/jiFHWSwWlO7W0+fGWrg==", "source": [ "risk-score.risk-score-latest-default", ".asset-criticality.asset-criticality-default", ".ds-logs-testlogs1-default-2024.10.01-000001", ".ds-logs-testlogs2-default-2024.10.01-000001", ".ds-logs-testlogs3-default-2024.10.01-000001" ], "type": "host" } } ```

Field retention operators

First some terminology:

  • latest value - the value produced by the transform which represents the latest vioew of a given field in the transform lookback period
  • enrich value - the value added to the document by the enrich policy, this represents the last value of a field outiside of the transform lookback window

We hope that this will one day be merged into the entity manager framework so I've tried to abstract this as much as possible. A field retention operator specifies how we should choose a value for a field when looking at the latest value and the enrich value.

Collect values

Collect unique values in an array, first taking from the latest values and then filling with enrich values up to maxLength.

{
  operation: 'collect_values',
  field: 'host.ip',
  maxLength: 10
}

Prefer newest value

Choose the latest value if present, otherwise choose the enrich value.

{
  operation: 'prefer_newest_value',
  field: 'asset.criticality'
}

Prefer oldest value

Choose the enrich value if it is present, otherwise choose latest.

{
  operation: 'prefer_oldest_value',
  field: 'first_seen_timestamp'
}

Test instructions

We currently require extra permissions for the kibana system user for this to work, so we must

1. Get Elasticsearch running from source

This prototype requires a custom branch of elasticsearch in order to give the kibana system user more privileges.

Step 1 - Clone the prototype branch

The elasticsearch branch is at https://github.com/elastic/elasticsearch/tree/entity-store-permissions.

Or you can use github command line to checkout my draft PR:

gh pr checkout 113942

Step 2 - Install Java

Install homebrew if you do not have it.

brew install openjdk@21
sudo ln -sfn /opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-21.jdk

Step 3 - Run elasticsearch

This makes sure your data stays between runs of elasticsearch, and that you have platinum license features

./gradlew run --data-dir /tmp/elasticsearch-repo --preserve-data -Drun.license_type=trial

2. Get Kibana Running

Step 1 - Connect kibana to elasticsearch

Set this in your kibana config:

elasticsearch.username: elastic-admin
elasticsearch.password: elastic-password

Now start kibana and you should have connected to the elasticsearch you made.

3. Initialise entity engine and send data!

  • Initialise the host or user engine (or both)
curl -H 'Content-Type: application/json' \
      -X POST \   
      -H 'kbn-xsrf: true' \
      -H 'elastic-api-version: 2023-10-31' \
      -d '{}' \
      http:///elastic:changeme@localhost:5601/api/entity_store/engines/host/init 

@hop-dev hop-dev force-pushed the entity-store-enrich-processor-for-rebase branch 2 times, most recently from 28c542b to a27f5a1 Compare September 26, 2024 09:41
@hop-dev hop-dev force-pushed the entity-store-enrich-processor-for-rebase branch from 3038ecf to 2bed7df Compare October 2, 2024 09:41
@hop-dev hop-dev changed the title [Entity Analytics] Add enrich policy and processor to entity engine [Entity Analytics] Add Field Retention Enrich Policy and Ingest Pipeline to Entity Engine Oct 2, 2024
@hop-dev hop-dev self-assigned this Oct 2, 2024
@hop-dev hop-dev added Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) Team:Entity Analytics Security Entity Analytics Team release_note:skip Skip the PR/issue when compiling release notes labels Oct 2, 2024
@tiansivive
Copy link
Contributor

Maybe a gist with the script?

@hop-dev hop-dev force-pushed the entity-store-enrich-processor-for-rebase branch from 5d1b4e0 to 7863b6e Compare October 2, 2024 20:06
@hop-dev hop-dev force-pushed the entity-store-enrich-processor-for-rebase branch from c85dd48 to f12e009 Compare October 7, 2024 13:14
Copy link
Contributor

@jloleysens jloleysens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for answering my question @hop-dev , lgtm!

Copy link
Contributor

@tiansivive tiansivive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have it ✅

@hop-dev
Copy link
Contributor Author

hop-dev commented Oct 11, 2024

@elasticmachine merge upstream

@hop-dev
Copy link
Contributor Author

hop-dev commented Oct 11, 2024

@elasticmachine merge upstream

@hop-dev
Copy link
Contributor Author

hop-dev commented Oct 11, 2024

@elasticmachine merge upstream

@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Jest Tests #19 / Header rendering it renders the data providers when show is true

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 20.7MB 20.7MB -159.0B
Unknown metric groups

ESLint disabled line counts

id before after diff
securitySolution 538 539 +1

Total ESLint disabled count

id before after diff
securitySolution 623 624 +1

History

cc @hop-dev

const unitedDefinition = getUnitedEntityDefinition({
namespace,
entityType,
fieldHistoryLength: 10, // we are not using this value so it can be anything
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused. So why do we have it? ❓ 😕 ❓

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make it optional?

@hop-dev hop-dev merged commit 5131215 into elastic:main Oct 11, 2024
44 checks passed
@hop-dev hop-dev deleted the entity-store-enrich-processor-for-rebase branch October 11, 2024 14:04
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.x

https://github.com/elastic/kibana/actions/runs/11293803178

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Oct 11, 2024
…ine to Entity Engine (elastic#193848)

## Summary

Add the "Ouroboros" part of the entity engine:

- an enrich policy is created for each engine
- the enrich policy is executed every 30s by a kibana task, this will be
1h once we move to a 24h lookback
- create an ingest pipeline for the latest which performs the specified
field retention operations (for more detail see below)

<img width="2112" alt="Screenshot 2024-10-02 at 13 42 11"
src="https://github.com/user-attachments/assets/f727607f-2e0a-4056-a51e-393fb2a97a95">

<details>
<summary> Expand for example host entity </summary>
```
{
    "@timestamp": "2024-10-01T12:10:46.000Z",
    "host": {
        "name": "host9",
        "hostname": [
            "host9"
        ],
        "domain": [
            "test.com"
        ],
        "ip": [
            "1.1.1.1",
            "1.1.1.2",
            "1.1.1.3"
        ],
        "risk": {
            "calculated_score": "70.0",
            "calculated_score_norm": "27.00200653076172",
            "calculated_level": "Low"
        },
        "id": [
            "1234567890abcdef"
        ],
        "type": [
            "server"
        ],
        "mac": [
            "AA:AA:AA:AA:AA:AB",
            "aa:aa:aa:aa:aa:aa",
            "AA:AA:AA:AA:AA:AC"
        ],
        "architecture": [
            "x86_64"
        ]
    },
    "asset": {
        "criticality": "low_impact"
    },
    "entity": {
        "name": "host9",
        "id": "kP/jiFHWSwWlO7W0+fGWrg==",
        "source": [
            "risk-score.risk-score-latest-default",
            ".asset-criticality.asset-criticality-default",
            ".ds-logs-testlogs1-default-2024.10.01-000001",
            ".ds-logs-testlogs2-default-2024.10.01-000001",
            ".ds-logs-testlogs3-default-2024.10.01-000001"
        ],
        "type": "host"
    }
}
```
</details>

### Field retention operators

First some terminology:

- **latest value** - the value produced by the transform which
represents the latest vioew of a given field in the transform lookback
period
- **enrich value** - the value added to the document by the enrich
policy, this represents the last value of a field outiside of the
transform lookback window

We hope that this will one day be merged into the entity manager
framework so I've tried to abstract this as much as possible. A field
retention operator specifies how we should choose a value for a field
when looking at the latest value and the enrich value.

### Collect values
Collect unique values in an array, first taking from the latest values
and then filling with enrich values up to maxLength.

```
{
  operation: 'collect_values',
  field: 'host.ip',
  maxLength: 10
}
```

### Prefer newest value
Choose the latest value if present, otherwise choose the enrich value.

```
{
  operation: 'prefer_newest_value',
  field: 'asset.criticality'
}
```

### Prefer oldest value
Choose the enrich value if it is present, otherwise choose latest.
```
{
  operation: 'prefer_oldest_value',
  field: 'first_seen_timestamp'
}
```

## Test instructions

We currently require extra permissions for the kibana system user for
this to work, so we must

### 1. Get Elasticsearch running from source
This prototype requires a custom branch of elasticsearch in order to
give the kibana system user more privileges.

#### Step 1 - Clone the prototype branch
The elasticsearch branch is at
https://github.com/elastic/elasticsearch/tree/entity-store-permissions.

Or you can use [github command line](https://cli.github.com/) to
checkout my draft PR:
```
gh pr checkout 113942
```
#### Step 2 - Install Java
Install [homebrew](https://brew.sh/) if you do not have it.

```
brew install openjdk@21
sudo ln -sfn /opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-21.jdk
```

#### Step 3 - Run elasticsearch
This makes sure your data stays between runs of elasticsearch, and that
you have platinum license features

```
./gradlew run --data-dir /tmp/elasticsearch-repo --preserve-data -Drun.license_type=trial
```

### 2. Get Kibana  Running

#### Step 1 - Connect kibana to elasticsearch

Set this in your kibana config:

```
elasticsearch.username: elastic-admin
elasticsearch.password: elastic-password
```
Now start kibana and you should have connected to the elasticsearch you
made.

### 3. Initialise entity engine and send data!

- Initialise the host or user engine (or both)

```
curl -H 'Content-Type: application/json' \
      -X POST \
      -H 'kbn-xsrf: true' \
      -H 'elastic-api-version: 2023-10-31' \
      -d '{}' \
      http:///elastic:changeme@localhost:5601/api/entity_store/engines/host/init
```

- use your favourite data generation tool to create data, maybe
https://github.com/elastic/security-documents-generator

---------

Co-authored-by: Elastic Machine <[email protected]>
Co-authored-by: kibanamachine <[email protected]>
(cherry picked from commit 5131215)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Oct 11, 2024
… Pipeline to Entity Engine (#193848) (#195929)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[Entity Analytics] Add Field Retention Enrich Policy and Ingest
Pipeline to Entity Engine
(#193848)](#193848)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Mark
Hopkin","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-10-11T14:04:49Z","message":"[Entity
Analytics] Add Field Retention Enrich Policy and Ingest Pipeline to
Entity Engine (#193848)\n\n## Summary\r\n\r\nAdd the \"Ouroboros\" part
of the entity engine:\r\n\r\n- an enrich policy is created for each
engine\r\n- the enrich policy is executed every 30s by a kibana task,
this will be\r\n1h once we move to a 24h lookback\r\n- create an ingest
pipeline for the latest which performs the specified\r\nfield retention
operations (for more detail see below)\r\n\r\n<img width=\"2112\"
alt=\"Screenshot 2024-10-02 at 13 42
11\"\r\nsrc=\"https://github.com/user-attachments/assets/f727607f-2e0a-4056-a51e-393fb2a97a95\">\r\n\r\n<details>\r\n<summary>
Expand for example host entity </summary>\r\n```\r\n{\r\n
\"@timestamp\": \"2024-10-01T12:10:46.000Z\",\r\n \"host\": {\r\n
\"name\": \"host9\",\r\n \"hostname\": [\r\n \"host9\"\r\n ],\r\n
\"domain\": [\r\n \"test.com\"\r\n ],\r\n \"ip\": [\r\n \"1.1.1.1\",\r\n
\"1.1.1.2\",\r\n \"1.1.1.3\"\r\n ],\r\n \"risk\": {\r\n
\"calculated_score\": \"70.0\",\r\n \"calculated_score_norm\":
\"27.00200653076172\",\r\n \"calculated_level\": \"Low\"\r\n },\r\n
\"id\": [\r\n \"1234567890abcdef\"\r\n ],\r\n \"type\": [\r\n
\"server\"\r\n ],\r\n \"mac\": [\r\n \"AA:AA:AA:AA:AA:AB\",\r\n
\"aa:aa:aa:aa:aa:aa\",\r\n \"AA:AA:AA:AA:AA:AC\"\r\n ],\r\n
\"architecture\": [\r\n \"x86_64\"\r\n ]\r\n },\r\n \"asset\": {\r\n
\"criticality\": \"low_impact\"\r\n },\r\n \"entity\": {\r\n \"name\":
\"host9\",\r\n \"id\": \"kP/jiFHWSwWlO7W0+fGWrg==\",\r\n \"source\":
[\r\n \"risk-score.risk-score-latest-default\",\r\n
\".asset-criticality.asset-criticality-default\",\r\n
\".ds-logs-testlogs1-default-2024.10.01-000001\",\r\n
\".ds-logs-testlogs2-default-2024.10.01-000001\",\r\n
\".ds-logs-testlogs3-default-2024.10.01-000001\"\r\n ],\r\n \"type\":
\"host\"\r\n }\r\n}\r\n```\r\n</details>\r\n\r\n### Field retention
operators\r\n\r\nFirst some terminology:\r\n\r\n- **latest value** - the
value produced by the transform which\r\nrepresents the latest vioew of
a given field in the transform lookback\r\nperiod\r\n- **enrich value**
- the value added to the document by the enrich\r\npolicy, this
represents the last value of a field outiside of the\r\ntransform
lookback window\r\n\r\nWe hope that this will one day be merged into the
entity manager\r\nframework so I've tried to abstract this as much as
possible. A field\r\nretention operator specifies how we should choose a
value for a field\r\nwhen looking at the latest value and the enrich
value.\r\n\r\n### Collect values\r\nCollect unique values in an array,
first taking from the latest values\r\nand then filling with enrich
values up to maxLength.\r\n\r\n```\r\n{\r\n operation:
'collect_values',\r\n field: 'host.ip',\r\n maxLength:
10\r\n}\r\n```\r\n\r\n### Prefer newest value\r\nChoose the latest value
if present, otherwise choose the enrich value.\r\n\r\n```\r\n{\r\n
operation: 'prefer_newest_value',\r\n field:
'asset.criticality'\r\n}\r\n```\r\n\r\n### Prefer oldest value\r\nChoose
the enrich value if it is present, otherwise choose
latest.\r\n```\r\n{\r\n operation: 'prefer_oldest_value',\r\n field:
'first_seen_timestamp'\r\n}\r\n```\r\n\r\n## Test instructions\r\n\r\nWe
currently require extra permissions for the kibana system user
for\r\nthis to work, so we must\r\n\r\n### 1. Get Elasticsearch running
from source\r\nThis prototype requires a custom branch of elasticsearch
in order to\r\ngive the kibana system user more privileges.\r\n\r\n####
Step 1 - Clone the prototype branch\r\nThe elasticsearch branch is
at\r\nhttps://github.com/elastic/elasticsearch/tree/entity-store-permissions.\r\n\r\nOr
you can use [github command line](https://cli.github.com/)
to\r\ncheckout my draft PR:\r\n```\r\ngh pr checkout
113942\r\n```\r\n#### Step 2 - Install Java\r\nInstall
[homebrew](https://brew.sh/) if you do not have it.\r\n\r\n```\r\nbrew
install openjdk@21\r\nsudo ln -sfn
/opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk
/Library/Java/JavaVirtualMachines/openjdk-21.jdk\r\n```\r\n\r\n#### Step
3 - Run elasticsearch\r\nThis makes sure your data stays between runs of
elasticsearch, and that\r\nyou have platinum license
features\r\n\r\n```\r\n./gradlew run --data-dir /tmp/elasticsearch-repo
--preserve-data -Drun.license_type=trial\r\n```\r\n\r\n### 2. Get Kibana
Running\r\n\r\n#### Step 1 - Connect kibana to elasticsearch\r\n\r\nSet
this in your kibana config:\r\n\r\n```\r\nelasticsearch.username:
elastic-admin\r\nelasticsearch.password: elastic-password\r\n```\r\nNow
start kibana and you should have connected to the elasticsearch
you\r\nmade.\r\n\r\n### 3. Initialise entity engine and send
data!\r\n\r\n- Initialise the host or user engine (or
both)\r\n\r\n```\r\ncurl -H 'Content-Type: application/json' \\\r\n -X
POST \\ \r\n -H 'kbn-xsrf: true' \\\r\n -H 'elastic-api-version:
2023-10-31' \\\r\n -d '{}' \\\r\n
http:///elastic:changeme@localhost:5601/api/entity_store/engines/host/init
\r\n```\r\n\r\n- use your favourite data generation tool to create data,
maybe\r\nhttps://github.com/elastic/security-documents-generator\r\n\r\n---------\r\n\r\nCo-authored-by:
Elastic Machine
<[email protected]>\r\nCo-authored-by:
kibanamachine
<[email protected]>","sha":"51312159b0436841e0364d7aac0056757962907c","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Team:
SecuritySolution","backport:prev-minor","Team:Entity
Analytics"],"title":"[Entity Analytics] Add Field Retention Enrich
Policy and Ingest Pipeline to Entity
Engine","number":193848,"url":"https://github.com/elastic/kibana/pull/193848","mergeCommit":{"message":"[Entity
Analytics] Add Field Retention Enrich Policy and Ingest Pipeline to
Entity Engine (#193848)\n\n## Summary\r\n\r\nAdd the \"Ouroboros\" part
of the entity engine:\r\n\r\n- an enrich policy is created for each
engine\r\n- the enrich policy is executed every 30s by a kibana task,
this will be\r\n1h once we move to a 24h lookback\r\n- create an ingest
pipeline for the latest which performs the specified\r\nfield retention
operations (for more detail see below)\r\n\r\n<img width=\"2112\"
alt=\"Screenshot 2024-10-02 at 13 42
11\"\r\nsrc=\"https://github.com/user-attachments/assets/f727607f-2e0a-4056-a51e-393fb2a97a95\">\r\n\r\n<details>\r\n<summary>
Expand for example host entity </summary>\r\n```\r\n{\r\n
\"@timestamp\": \"2024-10-01T12:10:46.000Z\",\r\n \"host\": {\r\n
\"name\": \"host9\",\r\n \"hostname\": [\r\n \"host9\"\r\n ],\r\n
\"domain\": [\r\n \"test.com\"\r\n ],\r\n \"ip\": [\r\n \"1.1.1.1\",\r\n
\"1.1.1.2\",\r\n \"1.1.1.3\"\r\n ],\r\n \"risk\": {\r\n
\"calculated_score\": \"70.0\",\r\n \"calculated_score_norm\":
\"27.00200653076172\",\r\n \"calculated_level\": \"Low\"\r\n },\r\n
\"id\": [\r\n \"1234567890abcdef\"\r\n ],\r\n \"type\": [\r\n
\"server\"\r\n ],\r\n \"mac\": [\r\n \"AA:AA:AA:AA:AA:AB\",\r\n
\"aa:aa:aa:aa:aa:aa\",\r\n \"AA:AA:AA:AA:AA:AC\"\r\n ],\r\n
\"architecture\": [\r\n \"x86_64\"\r\n ]\r\n },\r\n \"asset\": {\r\n
\"criticality\": \"low_impact\"\r\n },\r\n \"entity\": {\r\n \"name\":
\"host9\",\r\n \"id\": \"kP/jiFHWSwWlO7W0+fGWrg==\",\r\n \"source\":
[\r\n \"risk-score.risk-score-latest-default\",\r\n
\".asset-criticality.asset-criticality-default\",\r\n
\".ds-logs-testlogs1-default-2024.10.01-000001\",\r\n
\".ds-logs-testlogs2-default-2024.10.01-000001\",\r\n
\".ds-logs-testlogs3-default-2024.10.01-000001\"\r\n ],\r\n \"type\":
\"host\"\r\n }\r\n}\r\n```\r\n</details>\r\n\r\n### Field retention
operators\r\n\r\nFirst some terminology:\r\n\r\n- **latest value** - the
value produced by the transform which\r\nrepresents the latest vioew of
a given field in the transform lookback\r\nperiod\r\n- **enrich value**
- the value added to the document by the enrich\r\npolicy, this
represents the last value of a field outiside of the\r\ntransform
lookback window\r\n\r\nWe hope that this will one day be merged into the
entity manager\r\nframework so I've tried to abstract this as much as
possible. A field\r\nretention operator specifies how we should choose a
value for a field\r\nwhen looking at the latest value and the enrich
value.\r\n\r\n### Collect values\r\nCollect unique values in an array,
first taking from the latest values\r\nand then filling with enrich
values up to maxLength.\r\n\r\n```\r\n{\r\n operation:
'collect_values',\r\n field: 'host.ip',\r\n maxLength:
10\r\n}\r\n```\r\n\r\n### Prefer newest value\r\nChoose the latest value
if present, otherwise choose the enrich value.\r\n\r\n```\r\n{\r\n
operation: 'prefer_newest_value',\r\n field:
'asset.criticality'\r\n}\r\n```\r\n\r\n### Prefer oldest value\r\nChoose
the enrich value if it is present, otherwise choose
latest.\r\n```\r\n{\r\n operation: 'prefer_oldest_value',\r\n field:
'first_seen_timestamp'\r\n}\r\n```\r\n\r\n## Test instructions\r\n\r\nWe
currently require extra permissions for the kibana system user
for\r\nthis to work, so we must\r\n\r\n### 1. Get Elasticsearch running
from source\r\nThis prototype requires a custom branch of elasticsearch
in order to\r\ngive the kibana system user more privileges.\r\n\r\n####
Step 1 - Clone the prototype branch\r\nThe elasticsearch branch is
at\r\nhttps://github.com/elastic/elasticsearch/tree/entity-store-permissions.\r\n\r\nOr
you can use [github command line](https://cli.github.com/)
to\r\ncheckout my draft PR:\r\n```\r\ngh pr checkout
113942\r\n```\r\n#### Step 2 - Install Java\r\nInstall
[homebrew](https://brew.sh/) if you do not have it.\r\n\r\n```\r\nbrew
install openjdk@21\r\nsudo ln -sfn
/opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk
/Library/Java/JavaVirtualMachines/openjdk-21.jdk\r\n```\r\n\r\n#### Step
3 - Run elasticsearch\r\nThis makes sure your data stays between runs of
elasticsearch, and that\r\nyou have platinum license
features\r\n\r\n```\r\n./gradlew run --data-dir /tmp/elasticsearch-repo
--preserve-data -Drun.license_type=trial\r\n```\r\n\r\n### 2. Get Kibana
Running\r\n\r\n#### Step 1 - Connect kibana to elasticsearch\r\n\r\nSet
this in your kibana config:\r\n\r\n```\r\nelasticsearch.username:
elastic-admin\r\nelasticsearch.password: elastic-password\r\n```\r\nNow
start kibana and you should have connected to the elasticsearch
you\r\nmade.\r\n\r\n### 3. Initialise entity engine and send
data!\r\n\r\n- Initialise the host or user engine (or
both)\r\n\r\n```\r\ncurl -H 'Content-Type: application/json' \\\r\n -X
POST \\ \r\n -H 'kbn-xsrf: true' \\\r\n -H 'elastic-api-version:
2023-10-31' \\\r\n -d '{}' \\\r\n
http:///elastic:changeme@localhost:5601/api/entity_store/engines/host/init
\r\n```\r\n\r\n- use your favourite data generation tool to create data,
maybe\r\nhttps://github.com/elastic/security-documents-generator\r\n\r\n---------\r\n\r\nCo-authored-by:
Elastic Machine
<[email protected]>\r\nCo-authored-by:
kibanamachine
<[email protected]>","sha":"51312159b0436841e0364d7aac0056757962907c"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/193848","number":193848,"mergeCommit":{"message":"[Entity
Analytics] Add Field Retention Enrich Policy and Ingest Pipeline to
Entity Engine (#193848)\n\n## Summary\r\n\r\nAdd the \"Ouroboros\" part
of the entity engine:\r\n\r\n- an enrich policy is created for each
engine\r\n- the enrich policy is executed every 30s by a kibana task,
this will be\r\n1h once we move to a 24h lookback\r\n- create an ingest
pipeline for the latest which performs the specified\r\nfield retention
operations (for more detail see below)\r\n\r\n<img width=\"2112\"
alt=\"Screenshot 2024-10-02 at 13 42
11\"\r\nsrc=\"https://github.com/user-attachments/assets/f727607f-2e0a-4056-a51e-393fb2a97a95\">\r\n\r\n<details>\r\n<summary>
Expand for example host entity </summary>\r\n```\r\n{\r\n
\"@timestamp\": \"2024-10-01T12:10:46.000Z\",\r\n \"host\": {\r\n
\"name\": \"host9\",\r\n \"hostname\": [\r\n \"host9\"\r\n ],\r\n
\"domain\": [\r\n \"test.com\"\r\n ],\r\n \"ip\": [\r\n \"1.1.1.1\",\r\n
\"1.1.1.2\",\r\n \"1.1.1.3\"\r\n ],\r\n \"risk\": {\r\n
\"calculated_score\": \"70.0\",\r\n \"calculated_score_norm\":
\"27.00200653076172\",\r\n \"calculated_level\": \"Low\"\r\n },\r\n
\"id\": [\r\n \"1234567890abcdef\"\r\n ],\r\n \"type\": [\r\n
\"server\"\r\n ],\r\n \"mac\": [\r\n \"AA:AA:AA:AA:AA:AB\",\r\n
\"aa:aa:aa:aa:aa:aa\",\r\n \"AA:AA:AA:AA:AA:AC\"\r\n ],\r\n
\"architecture\": [\r\n \"x86_64\"\r\n ]\r\n },\r\n \"asset\": {\r\n
\"criticality\": \"low_impact\"\r\n },\r\n \"entity\": {\r\n \"name\":
\"host9\",\r\n \"id\": \"kP/jiFHWSwWlO7W0+fGWrg==\",\r\n \"source\":
[\r\n \"risk-score.risk-score-latest-default\",\r\n
\".asset-criticality.asset-criticality-default\",\r\n
\".ds-logs-testlogs1-default-2024.10.01-000001\",\r\n
\".ds-logs-testlogs2-default-2024.10.01-000001\",\r\n
\".ds-logs-testlogs3-default-2024.10.01-000001\"\r\n ],\r\n \"type\":
\"host\"\r\n }\r\n}\r\n```\r\n</details>\r\n\r\n### Field retention
operators\r\n\r\nFirst some terminology:\r\n\r\n- **latest value** - the
value produced by the transform which\r\nrepresents the latest vioew of
a given field in the transform lookback\r\nperiod\r\n- **enrich value**
- the value added to the document by the enrich\r\npolicy, this
represents the last value of a field outiside of the\r\ntransform
lookback window\r\n\r\nWe hope that this will one day be merged into the
entity manager\r\nframework so I've tried to abstract this as much as
possible. A field\r\nretention operator specifies how we should choose a
value for a field\r\nwhen looking at the latest value and the enrich
value.\r\n\r\n### Collect values\r\nCollect unique values in an array,
first taking from the latest values\r\nand then filling with enrich
values up to maxLength.\r\n\r\n```\r\n{\r\n operation:
'collect_values',\r\n field: 'host.ip',\r\n maxLength:
10\r\n}\r\n```\r\n\r\n### Prefer newest value\r\nChoose the latest value
if present, otherwise choose the enrich value.\r\n\r\n```\r\n{\r\n
operation: 'prefer_newest_value',\r\n field:
'asset.criticality'\r\n}\r\n```\r\n\r\n### Prefer oldest value\r\nChoose
the enrich value if it is present, otherwise choose
latest.\r\n```\r\n{\r\n operation: 'prefer_oldest_value',\r\n field:
'first_seen_timestamp'\r\n}\r\n```\r\n\r\n## Test instructions\r\n\r\nWe
currently require extra permissions for the kibana system user
for\r\nthis to work, so we must\r\n\r\n### 1. Get Elasticsearch running
from source\r\nThis prototype requires a custom branch of elasticsearch
in order to\r\ngive the kibana system user more privileges.\r\n\r\n####
Step 1 - Clone the prototype branch\r\nThe elasticsearch branch is
at\r\nhttps://github.com/elastic/elasticsearch/tree/entity-store-permissions.\r\n\r\nOr
you can use [github command line](https://cli.github.com/)
to\r\ncheckout my draft PR:\r\n```\r\ngh pr checkout
113942\r\n```\r\n#### Step 2 - Install Java\r\nInstall
[homebrew](https://brew.sh/) if you do not have it.\r\n\r\n```\r\nbrew
install openjdk@21\r\nsudo ln -sfn
/opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk
/Library/Java/JavaVirtualMachines/openjdk-21.jdk\r\n```\r\n\r\n#### Step
3 - Run elasticsearch\r\nThis makes sure your data stays between runs of
elasticsearch, and that\r\nyou have platinum license
features\r\n\r\n```\r\n./gradlew run --data-dir /tmp/elasticsearch-repo
--preserve-data -Drun.license_type=trial\r\n```\r\n\r\n### 2. Get Kibana
Running\r\n\r\n#### Step 1 - Connect kibana to elasticsearch\r\n\r\nSet
this in your kibana config:\r\n\r\n```\r\nelasticsearch.username:
elastic-admin\r\nelasticsearch.password: elastic-password\r\n```\r\nNow
start kibana and you should have connected to the elasticsearch
you\r\nmade.\r\n\r\n### 3. Initialise entity engine and send
data!\r\n\r\n- Initialise the host or user engine (or
both)\r\n\r\n```\r\ncurl -H 'Content-Type: application/json' \\\r\n -X
POST \\ \r\n -H 'kbn-xsrf: true' \\\r\n -H 'elastic-api-version:
2023-10-31' \\\r\n -d '{}' \\\r\n
http:///elastic:changeme@localhost:5601/api/entity_store/engines/host/init
\r\n```\r\n\r\n- use your favourite data generation tool to create data,
maybe\r\nhttps://github.com/elastic/security-documents-generator\r\n\r\n---------\r\n\r\nCo-authored-by:
Elastic Machine
<[email protected]>\r\nCo-authored-by:
kibanamachine
<[email protected]>","sha":"51312159b0436841e0364d7aac0056757962907c"}}]}]
BACKPORT-->

Co-authored-by: Mark Hopkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) release_note:skip Skip the PR/issue when compiling release notes Team:Entity Analytics Security Entity Analytics Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants