Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ECS static mapping with ECS dynamic mapping for AAD indices #168497

Open
maryam-saeidi opened this issue Oct 10, 2023 · 14 comments
Open

Replace ECS static mapping with ECS dynamic mapping for AAD indices #168497

maryam-saeidi opened this issue Oct 10, 2023 · 14 comments
Assignees
Labels
blocked Feature:Alerting/Alerts-as-Data Issues related to Alerts-as-data and RuleRegistry Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@maryam-saeidi
Copy link
Member

maryam-saeidi commented Oct 10, 2023

📝 Summary

In Alert-As-Data (AAD), all the ECS fields are added statically to the mapping. In this ticket, I would like to discuss if it is a good idea to change it to dynamic mapping and how it can be done. (Slack discussion)

Motivation

@P1llus mentioned:

The dynamic template takes up much less memory per node (cluster state), than the full field mappings and the searching is much much quicker when you have 100 fields rather than 2000-3000

@simianhacker mentioned that when we add ECS fields statically, many fields are added to the mapping regardless of whether they are used or not, and as a result, in the KQL suggestion bar, the user will face many fields that many of them probably are not relevant.

Challenge

The challenge is ensuring dynamic mapping will not exceed the limit set for the number of fields in alerting. Here is the limit, which seems 1900 of them are ECS fields. This limitation was discussed in this RFC as well.

Possible solution

One solution to solve this issue is using true_until_limit that will be introduced in this PR.
But @pmuellr shared this concern:

The downside of that PR - using the new true_until_limit option - is that we aren't in control of what fields would end up NOT getting mapped. Which may end up being confusing, where one dynamic field could be mapped and another not. Probably even more complicated to debug if the limit is quite high, and we rarely run into the issue, and have 1000's of fields we have to investigate.

✅ Acceptance Criteria

  • Based on the discussion, if it makes sense, replace ECS static mapping with ECS dynamic mapping.
@maryam-saeidi maryam-saeidi added the Feature:Alerting/Alerts-as-Data Issues related to Alerts-as-data and RuleRegistry label Oct 10, 2023
@botelastic botelastic bot added the needs-team Issues missing a team label label Oct 10, 2023
@maryam-saeidi maryam-saeidi added the Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) label Oct 10, 2023
@botelastic botelastic bot removed the needs-team Issues missing a team label label Oct 10, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@maryam-saeidi
Copy link
Member Author

@ymao1 What are your thoughts about this?

@maryam-saeidi maryam-saeidi changed the title Replace ECS static mapping with ECS dynamic mapping Replace ECS static mapping with ECS dynamic mapping for AAD indices Oct 10, 2023
@ymao1
Copy link
Contributor

ymao1 commented Oct 10, 2023

Definitely worth discussing! Since we'd be using the dynamic mapping in conjunction with alerts as data fields, what happens when we have a kibana.alert.* field that matches a dynamic mapping path match or falls into the all_strings_to_keywords bucket but we want a different mapping than defined for the ECS mappings? What takes precendence?

We should probably also get the opinion of @elastic/security-detections-response-alerts since the use the ECS fields most heavily. It would be good to do a POC to make sure transitioning from a static to a dynamic template on upgrade is possible and seamless.

@P1llus
Copy link
Member

P1llus commented Oct 10, 2023

@ymao1 specific field mappings always takes priority over dynamic templates. First the field is checked to see if a mapping exists, if not the matching with dynamic template is applied.

A few things that would need to be considered is that you have dynamic:false set because you don't want to map non-ecs fields, it might be hard to run dynamic templates while also disabling this for non-ecs fields.
This could result in a larger field count since the custom fields also gets a field mapping.

But im sure there are ways around it :)

@ymao1
Copy link
Contributor

ymao1 commented Nov 26, 2024

POC here: #200072

In order to use the dynamic ECS mappings, we have to set dynamic: true for the index template.

Setting dynamic: true for the index template causes a bunch of test failures for security solutions. The specific one I investigated is copying non-ECS multifields from the source document, so the document being indexed has the following fields:

"random.entry_leader.name": "random non-ecs field"
"random.entry_leader.name.caseless": "random non-ecs field"
"random.entry_leader.name.text": "random non-ecs field"

Because dynamic: true, the mapping for the first field random.entry_leader.name is set to keyword and then there's a collision because this is actually a multifield so we see this error indexing:

{
    "errors": true,
    "took": 0,
    "items": [
        {
            "create": {
                "_index": ".ds-.preview.alerts-security.alerts-default-2024.11.26-000001",
                "_id": "e6b75f1c2ec75c9f50ef794f26241ca586d6e424",
                "status": 400,
                "error": {
                    "type": "illegal_argument_exception",
                    "reason": "can't merge a non object mapping [random.entry_leader.name] with an object mapping"
                }
            }
        }
    ]
}

When dynamic: false, all these fields would be ignored in the alert document.

This occurs even if ignore_malformed is true (which it already is for all the alerts indices).

We should consider whether we need to stick to static ECS mappings maybe just for detection rules?

@pmuellr
Copy link
Member

pmuellr commented Nov 26, 2024

We should consider whether we need to stick to static ECS mappings maybe just for detection rules?

Are security alerts the only docs that have non-ECS fields? No way to add them for o11y or stack rules? So "Add more fields to alert details" for ES Query is limited to only ECS fields? Oh, for ES Query it's a fixed list anyway :-) :

export const validSourceFields = [
HOST_NAME,
HOST_HOSTNAME,
HOST_ID,
CONTAINER_ID,
KUBERNETES_POD_UID,
];

I think we'd have to make that a constraint. Anytime a rule wants to add additional fields to an alert, they must use ECS field names, or they could run into the same problem as with the security rules. Wondering how we'd enforce that ...

@ymao1
Copy link
Contributor

ymao1 commented Nov 26, 2024

Yea, I think having to set dynamic: true in order to get the ECS dynamic template mappings to apply makes me a little nervous we lose a little control over all the mappings and then in the case of these types of mapping conflicts, we're basically throwing away data until the customer manually fixes the mappings. What happens if a customer messes with the index and indexes something that conflicts with a future ECS field? Currently we're able to determine on Kibana startup if there's a mapping conflict bc we know there won't be anything dynamic added later.

@pmuellr
Copy link
Member

pmuellr commented Nov 27, 2024

What happens if a customer messes with the index and indexes something that conflicts with a future ECS field?

All bets are are off if a customer messes with the index 😀

I'm more worried about US messing with the index 🤕 . Do we validate alert docs before writing them?

I see some ECS schemas in packages/kbn-alerts-as-data-utils/src/schemas/generated, but I'm not seeing them referenced as being used as validation - my TS experience in VS Code is not good right now, so I may be missing it.

But sounds like if security wants user-speciified non-ECS fields in their alerts, they won't want to be using dynamic: true, unless they also have some kind of validation for those fields.

@mikecote
Copy link
Contributor

mikecote commented Nov 27, 2024

I don't think we should allow non-ECS fields at the root unless they are stored within the kibana.* namespace to prevent scenarios where ECS adds support for the same field in the future.

Given dynamic: true would allow such scenario, it feels better to stick with static ECS mappings for all our indices? From what I gather, there's no alternative if we want to use dynamic ECS mappings?

It also allows us to avoid performing validation within Kibana, which would be a CPU intensive operation.

@maryam-saeidi
Copy link
Member Author

To ensure only ECS fields are saved at the root level, can we filter data based on ecs_field_map and remove fields that don't exist in that mapping before saving it into the alert index? For observability rules, we add additional fields based on the group by fields, so if we enable this feature for observability, I don't see a CPU-intensive operation in that case.
Enabling dynamic mapping for ECS fields will help us to move away from fieldsForAAD and improve the searchability of ECS fields that have data. (Related ticket)

Also, to ensure the correct mapping, should we add a dynamic template equal to the ECS mappings? (Or does the dynamic field mapping match ECS mapping?)

@marshallmain
Copy link
Contributor

To ensure only ECS fields are saved at the root level, can we filter data based on ecs_field_map and remove fields that don't exist in that mapping before saving it into the alert index?

Detection rules can't do this, we need to copy even non-ECS fields into detection alerts. A new dynamic option from ES that allows only dynamic fields that match a dynamic template would be ideal here so we could add ECS fields dynamically without risking a full mapping explosion. Something like dynamic: template instead of dynamic: true. They already have a few other possible dynamic values so adding another seems reasonable.

@ymao1
Copy link
Contributor

ymao1 commented Dec 9, 2024

@maryam-saeidi

To ensure only ECS fields are saved at the root level, can we filter data based on ecs_field_map

We were discussing this internally as well. I think adding this validation based on the schema would be the only way we would feel comfortable setting dynamic: true. Doing this validation for each alert document does add some overhead to the process (we do this with the event log documents and I believe we've seen that validation takes about 1ms per document, done synchronously and uses up a non-trivial amount of memory as well (#170345 (comment))). Also from a framework perspective, having to do this validation removes the benefit of switching to dynamic mappings because we still have to maintain a static ECS field map list that needs to be kept up to date with the latest ECS fields.

Also, to ensure the correct mapping, should we add a dynamic template equal to the ECS mappings?

Elasticsearch maintains a dynamic template called ecs@mappings https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/template-resources/src/main/resources/ecs%40mappings.json, so this is what we would use in place of the static mapping. We would not want to try to create a dynamic template on our own. How does this dynamic template look to you?

I can see from the O11y perspective the benefit of switching to the dynamic ECS template so if the additional schema validation overhead is acceptable to you, we can:

  • switch the current useEcs flag in the alert context registration from a boolean to an enum that allows rule types to decide whether to use static or dynamic
  • add a step before writing the alert document to validate the alert document against the expected schema - we would most likely do this using the generated alert schemas we currently have, which I don't think are widely used right now. We'd have to explore if we could do this more efficiently.

I think we would want to do this incrementally (switch rule types one at a time) and we would want input from the rule types to add functional tests to ensure no unwanted fields are being written.

@ymao1
Copy link
Contributor

ymao1 commented Dec 12, 2024

Discussed offline with @maryam-saeidi and it seems like if they are able to implement this issue: #183248 using dynamic: true within kibana.alert.group (with a guardrail as requested in elastic/elasticsearch#118223), the only remaining issue that would be solve by switching to a dynamic mapping is #191068, where the hard-coded fieldsForAad value that's specified during rule type registration determines the fields that are shown in the alert-related typeaheads. After discussing with @XavierM, it seems like we should be able to explore switching to using the fields cap API instead of relying on fieldsForAad. At the time of implementation, the fields cap API was slow and returned all mapped fields (not just the non-empty ones) so fieldsForAad was required in order to show only non-empty fields in the suggester. I've created an issue to investigate this switch: #204112

@maryam-saeidi Would it be ok to table this issue for the time being in favor of investigating using the field caps API?

@maryam-saeidi
Copy link
Member Author

@ymao1 Sounds good to me; thanks for checking fieldsForAad and creating a ticket for it.

@ymao1 ymao1 added the blocked label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Feature:Alerting/Alerts-as-Data Issues related to Alerts-as-data and RuleRegistry Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

No branches or pull requests

7 participants