Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KNET-12362] Produce V3 API - Improve visibility around produce-record errors #1300

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

wcheng92
Copy link

@wcheng92 wcheng92 commented Sep 25, 2024

Description

A single http requests to produce V3 API can contain many produce-records. A record receipt is returned for each record, each with its own error-code. For example see the produce example to produce a batch of records - Kafka REST APIs for Confluent Cloud | Confluent Documentation

{
"error_code": 401,
"cluster_id": lkc-vo0pz,
"topic_name": "testTopic1",
"partition_id": 0,
"offset": 67217,
"timestamp": "2021-12-13T16:29:10.951Z",
"key": {
"type": "BINARY",
"size": 10
},
"value": {
"type": "JSON",
"size": 26
}
}
{
"error_code": 503,
"cluster_id": lkc-vo0pz,
"topic_name": "testTopic1",
"partition_id": 0,
"offset": 67218,
"timestamp": "2021-12-13T16:29:10.951Z",
"key": {
"type": "BINARY",
"size": 10
},
"value": {
"type": "JSON",
"size": 26
}
}

Using the above example, REST log would like below with status 200(OK) -

10.1.1.158 - - [14/Nov/2023:19:20:40 +0000] "POST /kafka/v3/clusters/lkc-m8j07q/topics/apac.consumer.AddCalculatedAttributes.v1/records HTTP/1.1" 200 196 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36 Hutool" 503 -

Whereas clearly the user’s request had errors(sometimes 5xx). There is no visibility about these record-level error codes, which hinders oncall ability to answer customer’s question like “did the produce API request failed/succeeded for all records, or subset of records?”. This should be improved

Solution

Add a summarised counts of different error-codes in REST request log, Codes=401:1,503:1 as in example section below.

Example

Before - 10.1.1.158 - - [14/Nov/2023:19:20:40 +0000] "POST /kafka/v3/clusters/lkc-m8j07q/topics/apac.consumer.AddCalculatedAttributes.v1/records HTTP/1.1" 200 196 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36 Hutool" 503 -

Now - 10.1.1.158 - - [14/Nov/2023:19:20:40 +0000] "POST /kafka/v3/clusters/lkc-m8j07q/topics/apac.consumer.AddCalculatedAttributes.v1/records HTTP/1.1" 200 196 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36 Hutool" 503 - Codes=401:1,503:1

@wcheng92 wcheng92 requested a review from a team as a code owner September 25, 2024 19:16
@confluent-cla-assistant
Copy link

🎉 All Contributor License Agreements have been signed. Ready to merge.
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.

Copy link
Member

@dimitarndimitrov dimitarndimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great that Kafka REST continues to receive attention and improvements, especially for its newest significant API, the Produce v3!

I have one comment about the error counting structure.

…Counter, add cutom matcher for easymocking ProduceErrorCounter
[Update] Remove `equals` and `hashCode` methods of ProduceRecordError…
@ehumber
Copy link
Member

ehumber commented Oct 2, 2024

One last question from me - did you test this in our cloud environment @wcheng92 ?

[Update] Remove counting error codes for 200
Copy link
Member

@msn-tldr msn-tldr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@wcheng92

One last question from me - did you test this in our cloud environment @wcheng92 ?

Can you share the findings of ur ccloud testing? Merge once the Ccloud testing looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants