Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORE-20824: Merging forward updates from release/os/5.2 to release/os/5.3 - 2024-10-09 #6359

Open
wants to merge 8 commits into
base: release/os/5.3
Choose a base branch
from

Commits on Oct 8, 2024

  1. CORE-19384: Add reset after RebalanceInProgressException (#6181)

    When lost membership requests were investigated it seems to be a problem that occurs during rebalance in the state and event pattern. After the rebalance the commits that were not successful don't get re-processed and are instead skipped over. As part of handling this, this adds a reset of the event offset position so we try processing again from the correct place after a RebalanceInProgressException.
    emilybowe authored and LWogan committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    609cfbf View commit details
    Browse the repository at this point in the history
  2. CORE-20799: Change CommitFailedException classification (#6295)

    Change CommitFailedException classification from fatal to transient.
    This is required as the kafka connection tests exposed that we mark CommitFailedException as fatal and therefore do not retry. A CommitFailedException means we should abort the transaction but by classifying it as fatal we will bubble this up and we will not retry on any level. A CommitFailedException is fatal at the transaction level but not at the worker level so this should be changed.
    emilybowe authored and LWogan committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    03ef673 View commit details
    Browse the repository at this point in the history
  3. change error to warn (#6294)

    This issue appeared while running kafka connection tests which kill the kafka broker.
    When the connection to the kafka broker was lost, ERROR level logs related to CryptoOpsClientImpl appeared several times. This is because there was a failure in executing net.corda.data.crypto.wire.ops.rpc.queries.SupportedSchemesRpcQuery.
    
    It has been verified how this is handled in the rest worker:
    KeyRestResource calls listSchemes and fails at cryptoOpsClient.getSupportedSchemes.
    The exception handling for listSchemes will return an InternalServerException. This extends HttpApiException and the response code will be 500.
    
    As we return an error to the rest client and the system can recover, the log level for the CryptoOpsClient failing the operation should be WARN level instead.
    emilybowe authored and LWogan committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    44fd98d View commit details
    Browse the repository at this point in the history
  4. CORE-20797 Adding handling for CordaMessageAPIProducerRequiresReset e…

    …xception (#6299)
    
    This change ensures that the exception `CordaMessageAPIProducerRequiresReset` is correctly handled, by re-raising it to the context which owns the `CordaProducer` and resetting the producer safely. Unit tests have also been added to verify this behaviour.
    ben-millar authored and LWogan committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    3b05a6c View commit details
    Browse the repository at this point in the history
  5. CORE-20795: Add exception handling and retry to KafkaAdmin and Virtua…

    …lNodeInfoProcessor (#6296)
    
    This issue appeared while running kafka connection tests which kill the kafka broker.
    
    In a flow worker compacted subscription we got a org.apache.kafka.common.errors.TimeoutException and handled it as fatal.
    
    This is because we have no error handling in our KafkaAdmin class for the exceptions that can be thrown by KafkaFuture.get() . The KafkaAdmin client is called from the VirtualNodeInfoProcessor which also does not catch this.
    
    This PR adds retry logic to the KafkaAdmin for getTopics and in the case of exhausting these retries, error handling in VirtualNodeInfoProcessor to handle this better and prevent this exception bubbling further.
    emilybowe authored and LWogan committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    104f19e View commit details
    Browse the repository at this point in the history
  6. CORE-20797 Adding CordaMessageAPIProducerRequiresReset exception to E…

    …xceptionUtils (#6311)
    
    This is required so that the exception is correctly handled in the Kafka message patterns.
    ben-millar authored and LWogan committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    0b98ca0 View commit details
    Browse the repository at this point in the history
  7. CORE-20794 Catching ProducerFencedException and rethrowing as a Produ…

    …cerRequiresReset (#6303)
    
    The `ProducerFencedException` in these cases is likely due to an invalid epoch being used in an operation as a result of a timeout on the broker side. The correct action in this case is to reset the producer.
    ben-millar authored and LWogan committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    47994ec View commit details
    Browse the repository at this point in the history

Commits on Oct 9, 2024

  1. Configuration menu
    Copy the full SHA
    ec632f9 View commit details
    Browse the repository at this point in the history