Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dataset quality] Add section for _ignored field in dataset flyout #172265

Closed
yngrdyn opened this issue Nov 30, 2023 · 7 comments · Fixed by #183934
Closed

[Dataset quality] Add section for _ignored field in dataset flyout #172265

yngrdyn opened this issue Nov 30, 2023 · 7 comments · Fixed by #183934
Assignees
Labels
Feature:Dataset Health Team:obs-ux-logs Observability Logs User Experience Team

Comments

@yngrdyn
Copy link
Contributor

yngrdyn commented Nov 30, 2023

📓 Summary

Allow users to get an overview of the fields that are inside _ignored property in their dataset.

image

⚠️ This is referencial, @isaclfreire is working on updating the designs with the latest discussions.

For iteration 1, we will display this table in the DQ Flyout.

✔️ Acceptance criteria

  • The flyout shows a section with the list of _ignored fields in the dataset.
  • Each ignored field will also display the count
  • Add last occurrence for each field in the column

Other columns displayed in the image are out of scope for this ticket.

@botelastic botelastic bot added the needs-team Issues missing a team label label Nov 30, 2023
@yngrdyn yngrdyn changed the title [Dataset quality] Add section for _ignored field [Dataset quality] Add section for _ignored field in flyout Nov 30, 2023
@yngrdyn yngrdyn added the Team:obs-ux-logs Observability Logs User Experience Team label Nov 30, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Nov 30, 2023
@felixbarny
Copy link
Member

I think we should display the number of times a specific field has been ignored. For example Like (42) message. As we don't necessarily know the type of the ignored field, I don't think it makes much sense to show the type indicators, such as [k] message.

We should also cater for the situation where there are a lot of distinct ignored fields so that we can't necessarily list them all.

@isaclfreire isaclfreire self-assigned this Nov 30, 2023
@isaclfreire
Copy link

I think it would be important to add a definition of what ignored fields are, so I'm calling for @mdbirnstiehl's input as well.

Can we also add here why we are adding this section to the flyout and what is the value for the user, for future reference? I'd love for our issues descriptions to always include the reasoning behind it, so I can come in with the design/UX solution for discussion and the objectives are clear for everyone who reads this. cc @gbamparop @ruflin

Question: Should users be able to take actions from these fields? Filtering by, for instance?

@ruflin
Copy link
Contributor

ruflin commented Nov 30, 2023

The _ignored fields are the fields that were not indexed because of different reasons. At first, these fields are informative for users to be able to go into the dataset and apply changes to the mapping / pipelines themselfs to fix it.

Actions on these fields are needed to jump to Logs Explorer and filter down on it, to see the exact documents that are affected. Later on, actions around jumping into a "fixing" flow will exist. It could go so far, that we directly offer them already a fix eventually 🪄 .

There are different reasons why a field can end up in _ignore. We initially wanted to build each reason into Elasticsearch directly but to reduce scope, for now we are munging it all together into a single field. @felixbarny has some ideas how we could still take it apart. Assuming we can take it apart, it would be great to show a list for each reason:

  • Field limit reached
  • Mapping conflict

How the different problems are solved is also different, so separating these is important.

@felixbarny
Copy link
Member

For the dataset quality page, we won't be able to determine the reason why a field has been ignored on aggregate. But I don't think that's a big issue. What matters most is which fields have been ignored, and how often. When clicking on an ignored field, it should take the user do the Logs explorer with a filter (_ignored: "<field>") so that it shows all documents that have a specific ignored field (such as message). The next step in the journey is for the user to open the flyout for a specific log event.

What's happening there should be defined in this issue:

In short, we should list all ignored fields, their values, and the reason for why they have been ignored.

@yngrdyn yngrdyn changed the title [Dataset quality] Add section for _ignored field in flyout [Dataset quality] Add section for _ignored field in dataset flyout Dec 1, 2023
@ruflin
Copy link
Contributor

ruflin commented Dec 1, 2023

But I don't think that's a big issue. What matters most is which fields have been ignored, and how often.

If we would have the field reason, we could directly offer quick actions to the user directly from the dataset quality page fly out. For example, we see 25 of the ignored fields hit the field limit. We can group it and tell them a single fix to get rid of all of it. In the Log Explorere, users have to go through each document and figure out the reason per document and don't see the bigger picture.

For now, all we can do is what Felix suggests but I think we put an unecessary burden on the user and should eventually improve it.

@isaclfreire isaclfreire removed their assignment Mar 19, 2024
@achyutjhunjhunwala achyutjhunjhunwala self-assigned this May 16, 2024
achyutjhunjhunwala added a commit that referenced this issue May 29, 2024
…183934)

## Summary

Closes #172265


## Description

This PR adds the Degraded Fields Table to the Dataset Quality Flyout for
individual Data Stream. Following tasks were done as part of this PR

1. A new server side endpoint created which queries the `datastream`
directly and aggregates `_ignored` fields for that data stream during
the given time range and also adds a sub aggregation for last occurence.
2. On the UI Side, the table was added with 3 columns as mentioned in
the Original ticket - Field, Count and Last Occurrence
3. The UI currently supports clients side sorting and pagination. We can
move this to server side pagination sorting if required in the future.
4. The Flyout also supports sync with the URL which means user can
navigate to the Dataset Quality page where the flyout would be open and
Sorting and Pagination would be pre-applied
5. API Tests
6. Stateful and Serverless FTR tests


## Screenshot

<img width="1278" alt="image"
src="https://github.com/elastic/kibana/assets/7416358/36a9b5cd-de05-4d17-99a2-cc08ec4583dd">


## Scenario

1. Spin up a 8.14-snapshot instance
2. Ingest degraded docs
3. Upgrade to 8.15-snapshot
4. Open Data Set Quality Flyout and see how the page looks like

<img width="1286" alt="image"
src="https://github.com/elastic/kibana/assets/7416358/100f3c8c-b697-4f81-ac7e-427d0f468407">
@isaclfreire
Copy link

Latest designs can be found in this Figma file with related comments and details :)

  1. Dataset page box
    image

  2. Logs detail flyout box
    image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Dataset Health Team:obs-ux-logs Observability Logs User Experience Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants