[Bug] Another issue result in broker lost bookie rack information in pulsar new version #23330
Open
3 tasks done
Labels
release/blocker
Indicate the PR or issue that should block the release until it gets resolved
type/bug
The PR fixed a bug or issue reported a bug
Search before asking
Read release policy
Version
pulsar-3.0.6
Minimal reproduce step
restart broker many times
What did you expect to see?
when broker restart, can load rack information correctly
What did you see instead?
This is a supplement of #23282. After fix the previous issue, we found another issue would cause broker lost rack information. When broker restart many times, some broker may generate the following log, illustrate that it has lost rack information.
Actually, the root reason is the same as #23282, two listeners in registrationClient must be executed in a sync way.
But the code implementation is async, when we restart broker, we do listener registering in PulsarRegistrationClient#watchWritableBookies. Notice that this is an async method. First listener is register in BookieRackAffinityMapping#watchAvailableBookies, using async way. Second listener is register in BookieWatcherImpl#initialBlockingBookieRead, using sync way.
That is the reason. In PulsarRegistrationClient#watchWritableBookies, two listeners is add into writableBookiesWatchers in a sync way. But when execute
getWritableBookies().thenAcceptAsync(registrationListener::onBookiesChanged, executor)
, it is async.It is easy to confirm this issue, just add some log in each part of PulsarRegistrationClient#watchWritableBookies. Since there are two listeners, We can see getWritableBookies() would be executed twice, in a async way.
pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/bookkeeper/PulsarRegistrationClient.java
Lines 180 to 187 in 9012422
Anything else?
By the way, previous version do not have this issue. Because previous version use ZKRegistrationClient as implementation of RegistrationClient, but not PulsarRegistrationClient. After PIP-45, pulsar change the implementation into PulsarRegistrationClient. Then would cause the issue.
Because ZKRegistrationClient#watchWritableBookies can ensure two listeners executed in a sync way. while PulsarRegistrationClient#watchWritableBookies can not.
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: