From 8c54fca6261109c1e75b11b627c59210f3a6c799 Mon Sep 17 00:00:00 2001 From: fengyubiao Date: Thu, 26 Sep 2024 00:07:59 +0800 Subject: [PATCH 1/2] [improve] [pip] PIP-382: Add a label named reason for topic_load_failed_total --- pip/pip-382.md | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) create mode 100644 pip/pip-382.md diff --git a/pip/pip-382.md b/pip/pip-382.md new file mode 100644 index 0000000000000..7e5d33c82618e --- /dev/null +++ b/pip/pip-382.md @@ -0,0 +1,48 @@ +# PIP-382: Add a label named reason for topic_load_failed_total + +# Background knowledge + +Pulsar has a metric that indicates load topic failed: `topic_load_failed_total`, it will be increased at the following cases +- The target bundle in unloading. +- Failed to load policies. +- Failed to load up Managed Ledger. +- Failed to read Metadata store. +- Topic initialize fails, such as failed to re-build deduplication info. +- Topic load timeout. +- Others. + +# Motivation & Goals + +Adding an additional label of the metric `topic_load_failed_total` may let us know what error happened fastly, so we can fix the issue fastly. + +### Metrics + +Add a label named reason for topic_load_failed_total +- label name: `reason` +- label values: + - `bundle_unloading` + - `failed_load_policies` + - `failed_load_ml` + - `failed_access_metadata_store` + - `failed_init` + - `timeout` + - `others` + + +# Monitoring & Alternatives + +- If the value of label value `reason = bundle_unloading` increases a moment, and it stop sto increase after a while, it means everything is fine. + - Otherwise, the load-balancer may encounter an error. +- If the value of label value `reason = timeout` increases a moment, and it stops to increase after a while, it means too many topics were loaded at the same time, it may be okay. + - Otherwise, broker may encounter a deadlock issue, or the resources is not enough for the current use case. +- For other label values, it means something is not expected, and we can apart them by the label value. + +# General Notes + +# Links + + +* Mailing List discussion thread: +* Mailing List voting thread: From e4df563bed0af896d7a6494b492affc3c4f920a1 Mon Sep 17 00:00:00 2001 From: fengyubiao <9947090@qq.com> Date: Thu, 26 Sep 2024 10:01:58 +0800 Subject: [PATCH 2/2] Update pip-382.md --- pip/pip-382.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pip/pip-382.md b/pip/pip-382.md index 7e5d33c82618e..30109046d65ab 100644 --- a/pip/pip-382.md +++ b/pip/pip-382.md @@ -44,5 +44,5 @@ Add a label named reason for topic_load_failed_total -* Mailing List discussion thread: +* Mailing List discussion thread:https://lists.apache.org/thread/f3xhmm342jor042n5ykkxoc32ffcn85s * Mailing List voting thread: