Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORE-20824: Merging forward updates from release/os/5.2 to release/os/5.3 - 2024-10-09 #6359

Open
wants to merge 8 commits into
base: release/os/5.3
Choose a base branch
from

Conversation

corda-jenkins-ci02[bot]
Copy link
Contributor

This PR was created by the merge bot.

Includes:

emilybowe and others added 8 commits October 8, 2024 10:21
When lost membership requests were investigated it seems to be a problem that occurs during rebalance in the state and event pattern. After the rebalance the commits that were not successful don't get re-processed and are instead skipped over. As part of handling this, this adds a reset of the event offset position so we try processing again from the correct place after a RebalanceInProgressException.
Change CommitFailedException classification from fatal to transient.
This is required as the kafka connection tests exposed that we mark CommitFailedException as fatal and therefore do not retry. A CommitFailedException means we should abort the transaction but by classifying it as fatal we will bubble this up and we will not retry on any level. A CommitFailedException is fatal at the transaction level but not at the worker level so this should be changed.
This issue appeared while running kafka connection tests which kill the kafka broker.
When the connection to the kafka broker was lost, ERROR level logs related to CryptoOpsClientImpl appeared several times. This is because there was a failure in executing net.corda.data.crypto.wire.ops.rpc.queries.SupportedSchemesRpcQuery.

It has been verified how this is handled in the rest worker:
KeyRestResource calls listSchemes and fails at cryptoOpsClient.getSupportedSchemes.
The exception handling for listSchemes will return an InternalServerException. This extends HttpApiException and the response code will be 500.

As we return an error to the rest client and the system can recover, the log level for the CryptoOpsClient failing the operation should be WARN level instead.
…xception (#6299)

This change ensures that the exception `CordaMessageAPIProducerRequiresReset` is correctly handled, by re-raising it to the context which owns the `CordaProducer` and resetting the producer safely. Unit tests have also been added to verify this behaviour.
…lNodeInfoProcessor (#6296)

This issue appeared while running kafka connection tests which kill the kafka broker.

In a flow worker compacted subscription we got a org.apache.kafka.common.errors.TimeoutException and handled it as fatal.

This is because we have no error handling in our KafkaAdmin class for the exceptions that can be thrown by KafkaFuture.get() . The KafkaAdmin client is called from the VirtualNodeInfoProcessor which also does not catch this.

This PR adds retry logic to the KafkaAdmin for getTopics and in the case of exhausting these retries, error handling in VirtualNodeInfoProcessor to handle this better and prevent this exception bubbling further.
…xceptionUtils (#6311)

This is required so that the exception is correctly handled in the Kafka message patterns.
…cerRequiresReset (#6303)

The `ProducerFencedException` in these cases is likely due to an invalid epoch being used in an operation as a result of a timeout on the broker side. The correct action in this case is to reset the producer.
@corda-jenkins-ci02
Copy link
Contributor Author

Please remember to 'Merge' all forward merges and do not 'Squash and Merge'

Copy link

sonarcloud bot commented Oct 9, 2024

@corda-jenkins-ci02
Copy link
Contributor Author

Jenkins build for PR 6359 build 1

Build Successful:
Jar artifact version produced by this PR: 5.3.0.0-alpha-1728444928612
Helm chart version produced by this PR: 5.3.0-alpha.1728444928612
Helm chart pushed to: oci://corda-os-docker-dev.software.r3.com/helm-charts/pr-6359/corda
Helm chart Polaris score: 82

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants