[filebeat] Elasticsearch state storage for httpjson and cel inputs #41446

aleksmaus · 2024-10-24T20:38:53Z

Proposed commit message

[filebeat] Elasticsearch state storage for httpjson input

This is a POC for Elasticsearch as State Store Backend for Security Integrations for Agentless solution.

The scope of this change was narrowed down to supporting only httpjson inputs in order to support Okta integration for the initial release. All the other integrations inputs still use the file storage as before.
This is a short term solution for the state storage for k8s environment.

This is the first cut and the details can change depending on the feedback.

Current feature currently could be enabled AGENTLESS_ELASTICSEARCH_STATE_STORE_ENABLED, to be decided how this would be configurable in k8s.

This change currently contains the hacky approach to the AGENTLESS_ELASTICSEARCH_APIKEY overwrite. This allows to the user to provide the ApiKey with elevated permissions that are required in order to be able to create/write/read the state index per input. THIS IS FOR DEVELOPMENT/TESTING ONLY. REMOVE BEFORE THE MERGE.

The existing code relied on the inputs state storage to be fully configurable before the main beat managers runs. The change delays the configuration of httpjson input to the time when the actual configuration is received from the Agent.

There is an assumption that the index template for the state storage indices is already in place before the storage is used

PUT _index_template/agentless_state_template
{
  "index_patterns": [
    "agentless-state-*"
  ],
  "priority": 300,
  "template": {
    "mappings": {
      "properties": {
        "v": {
          "type": "object",
          "enabled": false
        },
        "updated_at": {
          "type": "date",
          "format": "strict_date_optional_time||epoch_millis"
        }
      }
    },
    "settings": {
      "number_of_shards": 1
    }
  }
}

Example of the state storage index content for Okta integration:

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "agentless-state-httpjson-okta.system-028ecf4b-babe-44c6-939e-9e3096af6959",
        "_id": "httpjson::httpjson-okta.system-028ecf4b-babe-44c6-939e-9e3096af6959::https://dev-36006609.okta.com/api/v1/logs",
        "_seq_no": 39,
        "_primary_term": 1,
        "_score": 1,
        "_source": {
          "v": {
            "ttl": 1800000000000,
            "updated": "2024-10-24T20:21:22.032Z",
            "cursor": {
              "published": "2024-10-24T20:19:53.542Z"
            }
          }
        }
      }
    ]
  }
}

The naming convention for all state store is agentless-state-<input id>, since the expectation for agentless we would have only one agent per policy and the agents are ephemeral.

Currently in order to run the agent with Elasticsearch state storage a couple of environment variables would be required:

sudo AGENTLESS_ELASTICSEARCH_STATE_STORE_ENABLED=1 AGENTLESS_ELASTICSEARCH_APIKEY=xxxxxxxx-xvpDXfB:jVMRsW7SRIxxxxxxxxx ./elastic-agent -e

where the ApiKey in the

DEPENDENCIES / TODOS:

Approval of teams for this approach
Kibana (?) side change is required for the agentless-state index template boostrapping
Kibana or the intergration package (or both) change is required in order to include the permissions for agentless-state- with the Elasticsearch ApiKey (Remove the hack). I suspect that Kibana fleet code could be modified to recognize agentless supporting integration and include the proper index name for the agentless-state for the ApiKey permissions.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

The change should have no impact, and without the feature enabled the filebeat should work as before using the file system storage for the state.

mergify · 2024-10-24T20:39:42Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @aleksmaus? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

mergify · 2024-10-24T20:39:43Z

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

aleksmaus · 2024-10-24T20:45:15Z

filebeat/beater/filebeat.go

+			// Injecting the ApiKey that has enough permissions to write to the index
+			// TODO: need to figure out how add permissions for the state index
+			// agentless-state-<input id>, for example httpjson-okta.system-028ecf4b-babe-44c6-939e-9e3096af6959
+			apiKey := os.Getenv("AGENTLESS_ELASTICSEARCH_APIKEY")


will collaborate with agentless team on addressing this part

When running under Elastic agent, every change of the output configuration results in a restart of the Beat process, in case that simplifies anything here for you.

libbeat/statestore/backend/es/store.go

aleksmaus · 2024-10-31T11:49:50Z

@belimawr @cmacknz (or whoever wants/have time to be involved)
I need your feedback on this draft, if this approach is something that we could eventually merge (the ApiKey workaround will be removed once we adjust the kibana fleet).
I think this is an ok solution given the circumstances:

This is fully backwards compatible. If the feature is not enabled, everything would work as before.
The only inputs that are enabled for Elasticsearch backed state storage are the httpjson and cel. We only enabling the limited number of integration relying on httpjson or cel inputs for the first release.
The state initialization for the inputs is delayed until we get the configuration only if the feature is enabled for the input.
The agent logs monitoring will still use the local storage, since we agreed that loosing the agent log when the pod relocated is acceptable.

cmacknz · 2024-11-01T20:23:50Z

@leehinman I'd appreciate a review here to make sure this can co-exist with Beats receivers in agent since that would be the long term way we plan to run agentless inputs.

cmacknz · 2024-11-01T20:27:55Z

filebeat/beater/filebeat.go

+
+			// TODO: REMOVE THIS HACK BEFORE MERGE. LEAVING FOR TESTING FOR DRAFT
+			// Injecting the ApiKey that has enough permissions to write to the index
+			// TODO: need to figure out how add permissions for the state index


Fleet knows when something is an agentless package and that is probably what would hook into this to generate the key.

We could add a new state storage section to an agent policy (agent.storage?) that Fleet knows how to template when this happens.

Agent could then send it down as another output unit with a new type (or we could define a new type of unit but that is even more work).

This would allow the key to update on the fly through Fleet and control protocol.

This could also possibly be handled in the agentless api / controller and hidden from Fleet if we just inject it in as an env var. No opposition to that either really.

This could also possibly be handled in the agentless api / controller and hidden from Fleet if we just inject it in as an env var. No opposition to that either really.

I brought this up during the meeting today as an option. IMHO it's just one thing to manage, might be cleaner if all in one place in the policy.

A couple of details we need to think about with respect to these keys is what the process should be for rotating and/or revoking them.

cmacknz · 2024-11-01T20:29:52Z

filebeat/features/features.go

+
+// List of input types Elasticsearch state store is enabled for
+var esTypesEnabled = map[string]void{
+	"httpjson": {},


Can this be configuration instead of in the code, maybe another env var?

Sure can do. Something like this?
AGENTLESS_ELASTICSEARCH_STATE_STORE_INPUT_TYPES=httpjson,cel

cmacknz · 2024-11-01T20:32:57Z

libbeat/statestore/backend/es/store.go

+}
+
+func (s *store) get(key string, to interface{}) error {
+	status, data, err := s.cli.Request("GET", fmt.Sprintf("/%s/%s/%s", s.index, docType, url.QueryEscape(key)), "", nil, nil)


These requests should all be tied to a context.

Also, they probably need some minimum amount of retries.

The biggest design difference with ES is now the requests can fail. A file on disk doesn't give us 429 errors.

At a very high level, it feels like the way we deal with this is:

Don't start or allow the input to progress until it has successfully initialized the state at least once to avoid massively duplicating data.

Writes are asynchronous from the caller's perspective and the latest state is continuously retried.

These requests should all be tied to a context.

Looks like the current implementation of the client uses the context

beats/libbeat/esleg/eslegclient/connection.go

Line 406 in 249d0dc

req, err := http.NewRequestWithContext(conn.reqsContext, method, url, body)

that is set when the client is constructed

beats/libbeat/esleg/eslegclient/connection.go

Line 262 in 249d0dc

for _, client := range clients {

beats/libbeat/esleg/eslegclient/connection.go

Line 286 in 249d0dc

conn.reqsContext = ctx

cmacknz · 2024-11-01T20:35:48Z

The state initialization for the inputs is delayed until we get the configuration only if the feature is enabled for the input.

To simplify the PR, is there any simplification in pulling this part out and/or just always delaying the store initialization when run under Elastic agent?

cmacknz · 2024-11-01T20:37:34Z

For the rest of the PR, I think reviewing this would be easier if we had a design doc that addressed the following questions:

Where the API key and ES configuration is going to come from. I imagine we are going to need things like rate limit configuration eventually in addition to the basics of a host+API key.
How we are going to deal with the fact that the store operations are much more likely to fail or could experience brief or prolonged unavailability.
How we expect this to integrate with the Beats receivers work. Probably the agent team is best positioned to help with this.

leehinman · 2024-11-01T20:39:43Z

@leehinman I'd appreciate a review here to make sure this can co-exist with Beats receivers in agent since that would be the long term way we plan to run agentless inputs.

Still reviewing, but I wanted to point out that this won't work at all for a beat receiver. For a beat receiver the output (in the beat configuration part) will always be otelconsumer. The beat receiver never "sees" any of the exporter configuration (elasticsearch, kafka, redist, etc). I think for a beat receiver we would want to use the otel storage extension and pass that in.

cmacknz · 2024-11-01T20:47:34Z

Yes an explicit storage extension in Beats itself would make this much easier to do. Unfortunately we don't have that.

leehinman · 2024-11-01T21:29:54Z

Yes an explicit storage extension in Beats itself would make this much easier to do. Unfortunately we don't have that.

It would. But I was more thinking that we could modify the signature of NewBeatReceiver, so we could pass in a storage extension and store it in the beat.Info like we do for the LogConsumer. The filebeat Run function would then have access to this, so if it was present it could use it.

This would make the state store more like logging and the consumer, where configuration is handled at the otel level.

aleksmaus · 2024-11-05T17:35:31Z

Added AGENTLESS_ELASTICSEARCH_STATE_STORE_INPUT_TYPES as requested in PR review

example AGENTLESS_ELASTICSEARCH_STATE_STORE_INPUT_TYPES="httpjson,cel"

Now no input types are enabled by default for Elasticsearch state storage.
Example how to run the agent with the new flag:

sudo AGENTLESS_ELASTICSEARCH_STATE_STORE_ENABLED=1 AGENTLESS_ELASTICSEARCH_STATE_STORE_INPUT_TYPES="httpjson,cel" AGENTLESS_ELASTICSEARCH_APIKEY=fsOitZIBVlcA-mvxxxxx:jVMRsW7SRIOc-U6VHxxxxx ./elastic-agent -e

Switching this PR from draft.

elasticmachine · 2024-11-05T17:35:37Z

Pinging @elastic/sec-deployment-and-devices (Team:Security-Deployment and Devices)

elasticmachine · 2024-11-05T18:22:00Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

[filebeat] Elasticsearch state storage for httpjson input

55c72d3

aleksmaus added the enhancement label Oct 24, 2024

aleksmaus requested review from orestisfl, olegsu and oren-zohar October 24, 2024 20:38

aleksmaus self-assigned this Oct 24, 2024

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 24, 2024

aleksmaus added the Team:Security-Deployment and Devices Deployment and Devices Team in Security Solution label Oct 24, 2024

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 24, 2024

mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Oct 24, 2024

aleksmaus commented Oct 24, 2024

View reviewed changes

aleksmaus added 3 commits October 24, 2024 16:54

Fixup tests

1bf288d

Linter

e003053

Enabled elastisearch storage support for cel input and some cleanup

dfce978

aleksmaus commented Oct 29, 2024

View reviewed changes

libbeat/statestore/backend/es/store.go Outdated Show resolved Hide resolved

Remove the "hack" with .Each implementation

e2e25fa

aleksmaus changed the title ~~[filebeat] Elasticsearch state storage for httpjson input~~ [filebeat] Elasticsearch state storage for httpjson and cel inputs Oct 30, 2024

aleksmaus added 2 commits October 31, 2024 07:27

Merge branch 'main' into poc/es_state_store

953355b

Adjust for the latest main es client signature change

c1fc2a8

aleksmaus added 3 commits October 31, 2024 08:02

Make check happy

c9b0256

Fixed missing interface method on test mock store

21d451d

Add error check in ES store Each

ffb9364

cmacknz requested a review from leehinman November 1, 2024 20:23

cmacknz reviewed Nov 1, 2024

View reviewed changes

Parameterize the supported input types through environment variables

10d212f

aleksmaus marked this pull request as ready for review November 5, 2024 17:35

aleksmaus requested review from a team as code owners November 5, 2024 17:35

aleksmaus requested a review from belimawr November 5, 2024 17:35

pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Nov 5, 2024

Delete the dev tests file

24000d7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[filebeat] Elasticsearch state storage for httpjson and cel inputs #41446

[filebeat] Elasticsearch state storage for httpjson and cel inputs #41446

aleksmaus commented Oct 24, 2024 •

edited

Loading

mergify bot commented Oct 24, 2024

mergify bot commented Oct 24, 2024

aleksmaus Oct 24, 2024

cmacknz Nov 1, 2024

aleksmaus commented Oct 31, 2024

cmacknz commented Nov 1, 2024

cmacknz Nov 1, 2024

cmacknz Nov 1, 2024

aleksmaus Nov 5, 2024

cmacknz Nov 5, 2024

cmacknz Nov 1, 2024

aleksmaus Nov 4, 2024

cmacknz Nov 1, 2024

cmacknz Nov 1, 2024

aleksmaus Nov 4, 2024 •

edited

Loading

cmacknz commented Nov 1, 2024

cmacknz commented Nov 1, 2024 •

edited

Loading

leehinman commented Nov 1, 2024

cmacknz commented Nov 1, 2024 •

edited

Loading

leehinman commented Nov 1, 2024

aleksmaus commented Nov 5, 2024

elasticmachine commented Nov 5, 2024

elasticmachine commented Nov 5, 2024

[filebeat] Elasticsearch state storage for httpjson and cel inputs #41446

Are you sure you want to change the base?

[filebeat] Elasticsearch state storage for httpjson and cel inputs #41446

Conversation

aleksmaus commented Oct 24, 2024 • edited Loading

Proposed commit message

DEPENDENCIES / TODOS:

Checklist

Disruptive User Impact

mergify bot commented Oct 24, 2024

mergify bot commented Oct 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aleksmaus commented Oct 31, 2024

cmacknz commented Nov 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aleksmaus Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

cmacknz commented Nov 1, 2024

cmacknz commented Nov 1, 2024 • edited Loading

leehinman commented Nov 1, 2024

cmacknz commented Nov 1, 2024 • edited Loading

leehinman commented Nov 1, 2024

aleksmaus commented Nov 5, 2024

elasticmachine commented Nov 5, 2024

elasticmachine commented Nov 5, 2024

aleksmaus commented Oct 24, 2024 •

edited

Loading

aleksmaus Nov 4, 2024 •

edited

Loading

cmacknz commented Nov 1, 2024 •

edited

Loading

cmacknz commented Nov 1, 2024 •

edited

Loading