Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[event log][discuss] should we stop using common top-level ecs fields for our own data? #95421

Closed
pmuellr opened this issue Mar 25, 2021 · 4 comments
Labels
discuss Feature:EventLog insight Issues related to user insight into platform operations and resilience Meta Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@pmuellr
Copy link
Member

pmuellr commented Mar 25, 2021

The current event log structure is documented here: https://github.com/elastic/kibana/blob/master/x-pack/plugins/event_log/README.md#event-documents

All those top-level fields, besides kibana, are existing top-level ECS fields.

There have been some thoughts about extending the event log to contain more rule-specific data - you could imagine an o11y rule type wanting to add host.cpu.usage to the event log documents written out by alerting, as kind of an "alerts as data" mechanism.

This would allow queries across the event log and other ECS indices to be able to be consistent field-wise.

But what if the rule type needed to write to the event field, or some other top-level field already being used by the event log itself? It can't, obviously.

So, how do we deal with this?

  1. status quo - there's a lot of ECS fields that we could allow rule types to add to the event log, but they'll never be able to use the existing fields we already depend on, especially event for rule-specific data.

  2. move all the existing event log top-level fields to a new custom top-level field

  3. come up with a story where we write both a "generic" event log document like we do today, and a rule-type specific event log document. We'd still want some way of distinguishing the "generic" ones that alerting needs for it's historical access, and alerting would ignore the rule-type specific documents. We could write these to a single index with some kind of field (custom?) that indicates which type it is, or we could write them to separate indices.

  4. it's not clear we even need this, if the eventual "alerts as data" ends up solving the "get more rule-specific data into an event log kinda thing"

I'm thinking this issue is just kind of a place-holder for now. Assumption is we'd go with status quo, no changes to the existing event log, beyond enhancements like PR #95067 , which are perfectly fine. Although that one does add even more event fields, kind of making this "problem" even worse (assuming it's a problem).

Something certainly to keep in mind for the "alerts as data" index structures though.

One problem even with status quo is the set of fields we use internally for the event log can grow over time (like in the PR referenced above), which would then preclude it's usage by a rule type, so ... problem if a rule type used the field and then WE wanted to use the field for internal usage. We'd probably want to reserve event for our own purposes for now.

Not sure if we have any guidance for "second-level systems processing ECS data", where a process reads and writes existing customer-centric ECS data, and somehow also wants to augment that data with it's OWN ECS data (especially event and error fields). This is certainly one of those. Will need to poke around and chat with some ECS folks ...

If we did make some kind of change such that our own internal fields like the current event ones moved somewhere else in the document, we'll have a "migration" problem when queries cross indices where the structures are different. So, at best, 8.0 would be when we would "switch over" to a new structure completely, but we'd also want to start populating the "new/moved" fields prior to 8.0 (populate both old and new fields < 8.0, only new fields >= 8.0). Or perhaps we can just craft more elaborate queries based on the ecs version in use, or something (assuming we make this breaking change at the same time we update the version of ecs we use).

@pmuellr pmuellr added discuss Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Feature:EventLog labels Mar 25, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@banderror
Copy link
Contributor

Thanks for writing this up, let me try to leave some comments.

In #95067 I used an integration test to check which standard ECS fields are used by Event Log implementation. Meaning that I wanted to log the most simplistic event possible and check which fields will be set/overwritten not by myself, but by the plugin implementation.

This was the primitive event to log:

const { provider, action } = await getTestProviderAction();
const savedObject = getTestSavedObject();
const event: IEvent = {
  event: { provider, action },
  kibana: { saved_objects: [savedObject] },
};

This is what I got in the logs:

│ debg Event to log (input to IEventLogger)
│       {
│        "event": {
│          "provider": "provider-077dedd4-9361-4582-ab14-861d697f8262",
│          "action": "action-97a46527-2e6f-4972-b1e9-a08a7d4489cb"
│        },
│        "kibana": {
│          "saved_objects": [
│            {
│              "type": "event_log_test",
│              "id": "e5246588-fb33-43f5-bfcc-f80eef1542ba",
│              "rel": "primary"
│            }
│          ]
│        }
│      }
│ debg Indexed and fetched event (output from IEventLogClient)
│       {
│        "@timestamp": "2021-03-25T18:03:48.184Z",
│        "event": {
│          "provider": "provider-077dedd4-9361-4582-ab14-861d697f8262",
│          "action": "action-97a46527-2e6f-4972-b1e9-a08a7d4489cb"
│        },
│        "kibana": {
│          "saved_objects": [
│            {
│              "type": "event_log_test",
│              "id": "e5246588-fb33-43f5-bfcc-f80eef1542ba",
│              "rel": "primary"
│            }
│          ],
│          "server_uuid": "5b2de169-2785-441b-ae8c-186a1936b17d"
│        },
│        "ecs": {
│          "version": "1.8.0"
│        }
│      }

So event log implementation has set @timestamp, kibana.server_uuid and ecs.version.

I also noticed that if I additionally call eventLogger.startTiming(event) and eventLogger.stopTiming(event), event log implementation additionally sets event.start, event.end, event.duration.

│ debg Indexed and fetched event
│       {
│        "@timestamp": "2021-03-25T18:18:31.544Z",
│        "event": {
│          "provider": "provider-72c8a44f-03ee-46a7-9772-c6caebc619b8",
│          "action": "action-c5f0894c-e7f3-4b42-ae7d-e78b3d5cf555",
│          "start": "2021-03-25T18:18:31.544Z",
│          "end": "2021-03-25T18:18:31.544Z",
│          "duration": 0
│        },
│        "kibana": {
│          "saved_objects": [
│            {
│              "type": "event_log_test",
│              "id": "7cfe6d87-8f5b-4965-aab2-eba7f67131da",
│              "rel": "primary"
│            }
│          ],
│          "server_uuid": "5b2de169-2785-441b-ae8c-186a1936b17d"
│        },
│        "ecs": {
│          "version": "1.8.0"
│        }
│      }

Let's discuss each of them separately:

  1. @timestamp. This is not an issue for [Security Solution][Detections] Proposal: building Rule Execution Log on top of Event Log and ECS #94143, although I didn't expect that. I thought that setting a @timestamp value is a responsibility of a client of IEventLogger. This is not critical for rule execution log, but if we abstract it to some generic level of event log and alerts-as-data, this would be an issue for alerts-as-data implementation. I hope a simple workaround would work: if @timestamp is set by a client, IEventLogger should not overwrite it.

  2. kibana.server_uuid. This is not an issue since it's in a custom field set.

  3. ecs.version. This is not an issue for [Security Solution][Detections] Proposal: building Rule Execution Log on top of Event Log and ECS #94143, rule execution log can be built on top of that. I think this should not be an issue for alerts-as-data as well, at least for those alerts that will be generated by Kibana plugins. This might be an issue for alerts coming from external systems (like Endpoint Security) cause they could use a different version of ECS. This part is not clear to me. @dhurley14 @spong wdyt? Related to [Security Solution][Detections] Proposal: building Rule Execution Log on top of Event Log and ECS #94143 (comment)

  4. event.start, event.end, event.duration. I guess there is a particular use case in the Alerting framework for using eventLogger.startTiming(event) and eventLogger.stopTiming(event). I think using or not using them should be up to the client. I don't think this is an issue, as long as alerting events are separated from events of other clients. By that I mean if every client will use a scoped logger for their own needs with a different provider and other fields set for categorization and isolation, and log their own events, we will be fine.

  5. I think a thought closely related to that is: event data from a client (e.g. detection engine) should not be merged with event data from another client (e.g. alerting framework) or event_log plugin data (the underlying mechanism). E.g. if event_log plugin itself needs to log some data to event log, it should use a separate log (provider) and/or a custom field set.

  6. event.provider, event.action. Definitely not an issue for rule execution log, but again not sure about alerts-as-data. @dhurley14 @spong does detection engine set them when creating signals, or can they come from Endpoint/external system?

Are there any other fields that event log sets under the hood or uses as restricted fields in the current implementation?

But what if the rule type needed to write to the event field, or some other top-level field already being used by the event log itself? It can't, obviously.

Maybe it can - given it uses a separate logger (provider). It could be a child logger if we'd need to establish a relationship between Alerting framework (generic execution mechanism) and a particular rule (specific executor). Just like it's normally done in structured logging implementations, where when you create a child logger, you define a set of additional fields to write with each event, which could be used as correlation ids. It could look like:

const provider = 'rule-execution-log';

const logger = alerting.eventLog.getChildLogger(provider, {
  kibana: {
    detection_engine: {
      rule_type: rule.params.type,
      // etc
    }
  }
});

// getChildLogger will set correlation ids in `kibana.alerting` field set, e.g.
// {
//   kibana: {
//     alerting: {
//       execution_id: 123456,
//     }
//   }
// }

If we wouldn't need to log correlation ids, a client like detection engine would just obtain its own logger from event_log itself.

@spong
Copy link
Member

spong commented Mar 25, 2021

ecs.version. This is not an issue for #94143, rule execution log can be built on top of that. I think this should not be an issue for alerts-as-data as well, at least for those alerts that will be generated by Kibana plugins. This might be an issue for alerts coming from external systems (like Endpoint Security) cause they could use a different version of ECS. This part is not clear to me. @dhurley14 @spong wdyt? Related to #94143 (comment)

Great point @banderror, we have this same predicament over on the Security Detections side with the .siem-signals* index as we copy over the source event, and the underlying Rule could be looking at source indices that are a few ECS versions behind, and if there's a conflict we'll throw an error when trying to write the alert. Similar situation here, though ultimately depends on how much we rely on writing source docs and where they're coming from as you mentioned. Can always follow suit what we did in security and hope for the happy path, then if there's a conflict fall back to writing the log document without the original source document (or with reduced fields)?

event.provider, event.action. Definitely not an issue for rule execution log, but again not sure about alerts-as-data. @dhurley14 @spong does detection engine set them when creating signals, or can they come from Endpoint/external system?

Looks like we copy over all event fields from source (including provider/action), however we do override kind and set it to be signal.

@gmmorris gmmorris added the Meta label Jul 15, 2021
@gmmorris gmmorris added the insight Issues related to user insight into platform operations and resilience label Aug 13, 2021
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
@mikecote
Copy link
Contributor

Closing as we see ourselves being able to use ECS fields and don't foresee any further concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature:EventLog insight Issues related to user insight into platform operations and resilience Meta Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

No branches or pull requests

7 participants