Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dataset quality] Support failure store #199806

Draft
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

yngrdyn
Copy link
Contributor

@yngrdyn yngrdyn commented Nov 12, 2024

Closes https://github.com/elastic/logs-dev/issues/183 and https://github.com/elastic/logs-dev/issues/184

Summary

This PR aims to show to the user failed docs % in dataset quality table. The following acceptance criteria items were resolved

Dataset quality page

  • A column for Failed docs is included in the table
  • A tooltip is placed in the title of the column
  • A % of documents inside Failure store is calculated for every dataStream
  • If % is lesser than 0.0001 but greater than 0 we should show ⚠ symbol next to the ~0 value (as we do with degraded docs)
  • Failed docs percentages greater than 0 should link to discover

Dataset details page

  • A metric, Failed docs, is included in the Overview panel under Data set quality. This metric includes the number of documents inside the failure store for the specific dataStream.
  • A tooltip is placed in the title of the Failed docs metric with message: The percentage of docs sent to failure store due to an issue during ingestion.
  • Degraded docs graph section is transformed to Document trends allowing the users to switch between Degraded docs and Failed docs trends over time.
  • A new chart for failed documents is created with links to discover/Logs explorer using the right dataView

🎥 Demo

Screen.Recording.2024-11-12.at.11.53.02.mov

Specific cases

  • Failed documents less than 0.001%
image
  • Quality column is shown after degraded docs and failed docs is loaded (Quality is now calculated using both values)
Screen.Recording.2024-11-18.at.15.55.05.mov
Screen.Recording.2024-11-18.at.16.01.03.mov
  • Users can still see Quality even when they don't have access to stats or something went wrong with that endpoint
Screen.Recording.2024-11-18.at.16.02.47.mov
  • The selected chart is kept in the URL
Screen.Recording.2024-11-19.at.16.57.54.mov

Missing

The following acceptance criteria are missing

  • Failed docs percentages greater than 0 should link to discover
  • A new chart for failed documents is created with links to discover/Logs explorer using the right dataView

because es changes are not finalised

@elasticmachine
Copy link
Contributor

🤖 Jobs for this PR can be triggered through checkboxes. 🚧

ℹ️ To trigger the CI, please tick the checkbox below 👇

  • Click to trigger kibana-pull-request for this PR!
  • Click to trigger kibana-deploy-project-from-pr for this PR!

@yngrdyn yngrdyn self-assigned this Nov 12, 2024
@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 12, 2024

/ci

@@ -26,7 +26,7 @@ export const NONE = 'none';
export const DEFAULT_TIME_RANGE = { from: 'now-24h', to: 'now' };
export const DEFAULT_DATEPICKER_REFRESH = { value: 60000, pause: false };

export const DEFAULT_DEGRADED_DOCS = {
export const DEFAULT_QUALITY_DOC_STATS = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

This is now reused by degradedDocs and failedDocs.

};

return new DataStreamStat(dataStreamStatProps);
}

public static fromDegradedDocStat({
public static fromQualityStats({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

We can extend this method with more qualityStats and construct a Dataset from them

@@ -40,7 +40,7 @@ export const indexNameToDataStreamParts = (dataStreamName: string) => {
};

export const extractIndexNameFromBackingIndex = (indexString: string): string => {
const pattern = /.ds-(.*?)-[0-9]{4}\.[0-9]{2}\.[0-9]{2}-[0-9]{6}/;
const pattern = /.(?:ds|fs)-(.*?)-[0-9]{4}\.[0-9]{2}\.[0-9]{2}-[0-9]{6}/;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Failure store is at the moment a backing index that starts with .fs

@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 12, 2024

@mdbirnstiehl I need some help from your side

  1. The tooltip for the quality column should now tell the users that quality is calculated by degradedDocs and failedDocs percentages. I came up with
image
  1. How can we explain users what is a Failed doc? I came up with
image
  1. In the summary we should also reference the new calculation of quality (DegradedDocs and failedDocs)
image

const datasetsQuality = {
percentages: filteredItems.map((item) => item.degradedDocs.percentage),
};
const datasetsQuality = countBy(filteredItems.map((item) => item.quality)) as Record<
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quality is now part of the dataset, so we don't need this calculation anymore. Instead I just created a map that will hold how many datasets fall into a quality status. e.g.

{
  'poor': 1,
  'degraded': 2,
  'good': 1,
}

@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 12, 2024

/ci

@mdbirnstiehl
Copy link
Contributor

mdbirnstiehl commented Nov 12, 2024

@mdbirnstiehl I need some help from your side

  1. The tooltip for the quality column should now tell the users that quality is calculated by degradedDocs and failedDocs percentages.

This one looks good.

  1. How can we explain users what is a Failed doc? I came up with

I'm not sure if this is exactly how it works, but is there some way we could be more specific here? Like, "Percentage of docs sent to failure store due to an issue during ingestion."

I'm not sure "failure store" itself will be meaningful for users.

  1. In the summary we should also reference the new calculation of quality (DegradedDocs and failedDocs)

This tool tip can probably match the tool tip for the data set quality column in the table.

A side note, maybe this would need to be a different issue, but this heading:
image

Should probably be Data Set Quality, not sets. It also looks like the table column headers are in title case, and should be in sentence case to be consistent with the other pages in Stack Management. For example, "Failed docs" instead of "Failed Docs"

@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 12, 2024

I'm not sure if this is exactly how it works, but is there some way we could be more specific here? Like, "Percentage of docs sent to failure store due to an issue during ingestion."

That's exactly how it works 👍🏼

Should probably be Data Set Quality, not sets. It also looks like the table column headers are in title case, and should be in sentence case to be consistent with the other pages in Stack Management. For example, "Failed docs" instead of "Failed Docs"

I'll address those in this PR as well. This is how everything looks now

Screen.Recording.2024-11-12.at.16.42.32.mov

@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 12, 2024

/ci

@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 13, 2024

/ci

@mohamedhamed-ahmed
Copy link
Contributor

mohamedhamed-ahmed commented Nov 14, 2024

I started testing quickly and got 2 issues:

  1. When I make the failed_docs request fail from the network tab, I still get the results somehow not sure whats happening there but need to take a further look.
Screen.Recording.2024-11-14.at.12.38.24.mov
  1. When I open any dataset I get an error as the mapPercentageToQuality is expecting an array but gets a single value in this case
Screen.Recording.2024-11-14.at.12.39.03.mov

When I make the failed_docs request fail from the network tab, I still get the results somehow not sure whats happening there but need to take a further look.

This works fine you can ignore it, tried it and seems good. Only the other problem is existing

@yngrdyn yngrdyn force-pushed the dataset-quality-support-for-failure-store branch from 41e4f61 to 41d249a Compare November 14, 2024 13:17
Copy link
Contributor

@mohamedhamed-ahmed mohamedhamed-ahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did first round of code review and got some small comments

export async function getFailedDocsPaginated(options: {
esClient: ElasticsearchClient;
types: DataStreamType[];
datasetQuery?: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe renam datasetQuery to just datasetNames as its a bit confusing since its not really a query

Copy link
Contributor Author

@yngrdyn yngrdyn Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about renaming it to indexPattern? or maybe just datasetPattern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me 👍

@mohamedhamed-ahmed
Copy link
Contributor

I think the Quality Issues table is meant to show only 5 rows per page

I doubt it since we have a selection component to choose how many rows per page we can view.

@patpscal
Copy link

I doubt it since we have a selection component to choose how many rows per page we can view.

That's right! I totally forgot, thank you 🤣

@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 21, 2024

/ci

@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 21, 2024

/ci

@yngrdyn yngrdyn force-pushed the dataset-quality-support-for-failure-store branch from 80f5c53 to dd9816e Compare November 22, 2024 16:19
@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 22, 2024

/ci

1 similar comment
@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 25, 2024

/ci

@yngrdyn yngrdyn force-pushed the dataset-quality-support-for-failure-store branch from 7b32bf0 to 42db254 Compare November 25, 2024 18:59
@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 25, 2024

/ci

@yngrdyn yngrdyn force-pushed the dataset-quality-support-for-failure-store branch from 42db254 to 0bb1ac6 Compare November 26, 2024 14:46
@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 26, 2024

/ci

@yngrdyn yngrdyn force-pushed the dataset-quality-support-for-failure-store branch from 0bb1ac6 to 076cb6c Compare November 26, 2024 16:04
@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 26, 2024

/ci

@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 26, 2024

/ci

1 similar comment
@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 26, 2024

/ci

@yngrdyn yngrdyn force-pushed the dataset-quality-support-for-failure-store branch from a33ca8f to f10abac Compare November 26, 2024 19:55
@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 26, 2024

/ci

@yngrdyn yngrdyn force-pushed the dataset-quality-support-for-failure-store branch from 45e0213 to 895110f Compare November 27, 2024 13:11
@yngrdyn
Copy link
Contributor Author

yngrdyn commented Nov 27, 2024

/ci

@elasticmachine
Copy link
Contributor

💔 Build Failed

Failed CI Steps

History

cc @yngrdyn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants