-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Namespace MDC Fields with either labels or tags. #142
Comments
Hi and thanks for your issue! We previously did namespace them but removed that in #67. |
@felixbarny it's not a problem per se; it just that I would like the ability to namespace them given that the spec has a placeholder for these attributes to land. I also logged a defect over on the nodejs project and they were considering the same. I'll also highlight that the "metadata": {
"environment": "dev"
}, |
When namespacing, it makes it harder to set actual ECS fields, such as For the things you'd like to be namespaced, could you just prefix them with <additionalField>
<key>labels.environment</key>
<value>dev</value>
</additionalField> Better yet, for this particular field, you'll probably want to set <additionalField>
<key>service.environment</key>
<value>dev</value>
</additionalField> |
@felixbarny I agree for the Also, one things to know is that neither For example we will be emitting the cloud fields: <additionalField>
<key>cloud.provider</key>
<value>MY CLOUD PROVIDER</value>
</additionalField>
<additionalField>
<key>cloud.availability_zone</key>
<value>MY REGION</value>
</additionalField> and when they are emitted the come out at the root level: "cloud.provider": "MY CLOUD PROVIDER",
"cloud.availability_zone": "MY REGION", Instead I'd expect roughly the following: "cloud" : {
"provider": "MY CLOUD PROVIDER",
"availability_zone": "MY REGION",
}, |
Ah, so your issue is about dotted vs nested field names? |
@felixbarny yes and no. In this particular case I want the ability to specify the top level container for my additional attributes that are housed in MDC. I will say though that; that gives a potential path to resolving it. |
We, too, are sad to see that MDC values are not namespaced under
For the stated reasons, please at least provide an option to restore the previous behavior by making the namespace (and by "namespace" i mean "common key prefix") for MDC values configurable using the |
Hi and sorry for the late reply. The intention of this change was to make it simpler to add top-level fields (both ECS fields and custom fields). I still think that this generally makes sense but I do acknowledge that this can lead to nasty issues like mapping conflicts and data loss. I think there's a way to reconcile both - making it possible to add arbitrary top-level fields without making it brittle. But it requires some changes to the mappings. We're currently thinking about enhancing the default mappings to make them less brittle out-of-the-box: elastic/elasticsearch#88777. There's an early POC version of the mappings available in this PR. The key parts are to set
This change was introduced before version 1.0.0 was released. According to semantic versioning, breaking changes in 0.x are expected.
ECS is meant to be an extensible schema where projects or organizations can add custom fields as they see fit. But if the custom fields conflict with ECS, this can become can issue indeed.
Do you have an example from where you have seen this issue in the wild? Which field was the one that was overridden?
I agree. Developers should not need to change their usage of MDC. My idealized view of it is that it should be fine to add any top-level fields to the document without it being an issue. I'm aware that we're not currently in that state, especially when using the default mappings.
One of the issues with that is that our APM agents add a |
That's fair. I still think this was a change for the worse.
I understand that, but I actually want all the properties that our developers put into the MDC to be logged with the
If by "in the wild" you mean open source projects, then I don't know. But I've seen this happen in some of our many many microservices, mostly with the
It looks like the change was made for the sole purpose to enable your APM agents to log
Actually, I would rather not even give the developers the option to pollute the top level namespace. Again, our developers don't even know that they are ultimately logging to ECS. Logging to Elastic is my responsibility and I don't ever want the applications to be able to "break out" from their |
What if your developers use structured logging concepts to add fields to the log instead of MDC? Would you expect these to be at the top level or also nested? |
No, I meant in the services you're operating. The
Not only for that use case. The idea was that it would also be easier for users to add top-level ECS properties, such as |
That's a good question. Our developers currently just don't. But we are currently using
Let me clarify that: I don't want our developers to be able to accidentally pollute the top level namespace via MDC. If they are using the structured logging API on purpose to add keys to the top level, then I would be fine with that. |
Actually, some of our developers already use the structured logging concepts of logback to add a very specific top level key, namely |
I've mostly been looking at MDC as another means of doing structured logging. Often, the MDC is used as a workaround if the logging library doesn't provide a native way for structured logging, such as slf4j 1.x. But modern libraries, such as the recently released slf4j 2.x, as well as log4j 2.x have support for structured logging (IIRC), it may be appropriate to differ MDC and actual structured logging. However, I'm not sure if it's intuitive if these two ways of adding a logger.info(new StringMapMessage()
.with("message", "User has made a new comment")
.with("user.id", "felixbarny")); try (MDC.MDCCloseable closable = MDC.putCloseable("user.id", "felixbarny")) {
logger.info("User has made a new comment");
}
I'm wondering if that's generally the case or just for your current use cases. Let's say that a team in your organization decides they want to migrate from MDC to structured logging. They might just translate everything they put into the MDC to a structured log event in which case we're back at square 1. So maybe it is reasonable to treat MDC and structured logging in the same way?
Could you elaborate on why you want to know which keys are coming from the MDC? Would you also want to know the keys that are coming from a structured logging event? |
I think MDC serves a difference use-case than general structured logging. Usually I see MDC used to just dump the current "context" of a transaction. This includes the transaction-id, the logged in user.. but really, this can be anything attached to the current transaction context. In the Go programming language, there is usually a Of course, many of the keys of the transaction context should be better mapped to standardized fields of the ECS, but that is something that I want to do, for example using FluentBit mappers. I want to keep our developers blissfully ignorant about everything ECS.
As I just described, sometimes I want to see the full transaction context for debugging purposes. |
I was going to make a similar comment that MDC is often used contextual fields that span across all methods within a given flow. For example, if you use distributed tracing that's something that needs to be present in all log records and the only way to do so is via MDC unless you want to pass that as a direct parameter to all methods. |
Thanks for highlighting these nuances. Let me think about it a bit more but I do get your points. If we were to change it, we'd probably have to do that by making users opt-in to storing MDC as labels as it might be considered a breaking change otherwise. |
Let me try to summarize: We add an option to prefix all MDC values with This serves two purposes: It makes it easier to determine all fields that are valid throughout the whole (request) context and it avoids mapping conflicts and conflicts with fields from the log event, such as Some MDC values will need to be excluded from this so that they're at the top level. This is needed at least for the ones that are needed trace/log correlation, such as When adding key/value pairs to a specific log event (structured logging), the key/value pairs should remain to be on the top level, though. Although MDC can be used as a means to do structured logging on frameworks that don't have 1st party support for structured logging, it often serves a different purpose. Rather than a deliberate per-event fields, the MDC contains properties that are added to all events in a given context. Also, with the recently released slf4j2, all modern logging frameworks have built-in support for structured logging. Although, you could argue that you'd want to map some of these request scoped fields to ECS fields, too.
I'm not entirely convinced by this line of argumentation as the same conflicts can occur with both top-level structured logging fields and MDC fields. Also, some MDC fields do make sense at the top level. But there's probably no harm in having an opt-in option to prefix MDC values with Long-term, I think we should make it simple to just send arbitrary JSON (that may even contain duplicate field names) to Elasticsearch without having to worry about mapping issues. In this future, the proposed config option is not needed for the majority of use cases. @xeraa I wonder if you have any thoughts on this. |
Any time frame for this change? Marked as 8.8 candidate, and we're already on 8.9? |
I noticed when I was working with the ECS Java loggers that MDC and/or Metadata fields weren't name spaced.
I can see one of three scenario's:
labels
per [https://www.elastic.co/guide/en/ecs/current/ecs-base.html](Base Fields)tags
per [https://www.elastic.co/guide/en/ecs/current/ecs-base.html](Base Fields)Here's an example configuration:
and a resulting output:
The text was updated successfully, but these errors were encountered: