Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for 'isFSMWorkflow' flag and fix FSM workflow #161

Merged
merged 13 commits into from
Mar 26, 2024

Conversation

kaikulimu
Copy link
Collaborator

@kaikulimu kaikulimu commented Nov 29, 2023

Add support for FSM workflow of cluster node startup sequence

Note:
70beb20 adopts Vitaly's #174 into this PR
751032b adopts Vitaly's #190 into this PR

@kaikulimu kaikulimu requested a review from a team as a code owner November 29, 2023 21:30
@kaikulimu kaikulimu force-pushed the isFSM branch 3 times, most recently from 6bf3068 to 7a4ca95 Compare November 30, 2023 16:56
src/groups/mqb/mqbs/mqbs_storageutil.h Outdated Show resolved Hide resolved
src/groups/mqb/mqbs/mqbs_filestoreutil.cpp Show resolved Hide resolved
src/groups/mqb/mqbblp/mqbblp_clusterorchestrator.cpp Outdated Show resolved Hide resolved
src/groups/mqb/mqbc/mqbc_clusterfsm.cpp Outdated Show resolved Hide resolved
src/groups/mqb/mqbc/mqbc_incoreclusterstateledger.cpp Outdated Show resolved Hide resolved
src/groups/mqb/mqbc/mqbc_partitionfsm.cpp Outdated Show resolved Hide resolved
src/groups/mqb/mqbc/mqbc_recoverymanager.cpp Show resolved Hide resolved
src/groups/mqb/mqbc/mqbc_storagemanager.cpp Outdated Show resolved Hide resolved
src/groups/mqb/mqbc/mqbc_storagemanager.cpp Outdated Show resolved Hide resolved
@kaikulimu
Copy link
Collaborator Author

@678098 Addressed your comments. Can review again.

@kaikulimu kaikulimu assigned 678098 and unassigned kaikulimu Dec 4, 2023
@kaikulimu
Copy link
Collaborator Author

@dorjesinpo is going to make some changes near StorageUtil::recoveredQueuesCb() to retrieve the config of strong vs eventual consistency mode. This might potentially conflict the FSM PR so we will have to be careful.

@kaikulimu
Copy link
Collaborator Author

@678098's work on purge queue command can potentially conflict with this PR, too. Must be careful.

@kaikulimu kaikulimu force-pushed the isFSM branch 2 times, most recently from 02ad95b to f7af3f4 Compare December 26, 2023 22:41
@kaikulimu kaikulimu force-pushed the isFSM branch 5 times, most recently from 9c029f6 to fc31a38 Compare March 6, 2024 19:11
@kaikulimu kaikulimu force-pushed the isFSM branch 8 times, most recently from 4cc4d4a to 7bb2371 Compare March 8, 2024 19:25
Copy link
Collaborator

@dorjesinpo dorjesinpo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a thorough review (quarter-note has reviewed before)
Some minor notes

@@ -7,8 +7,8 @@
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// Unless required byClusterDataIdentityapplicable law or agreed to in writing,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless required by applicable law ...

Copy link
Collaborator Author

@kaikulimu kaikulimu Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -3605,8 +3628,6 @@ Cluster::sendRequest(const Cluster::RequestManagerType::RequestSp& request,
mqbnet::ClusterNode* target,
bsls::TimeInterval timeout)
{
// executed by the cluster *DISPATCHER* thread
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that not in the Dispatcher thread anymore? If so, need to comment on the thread safety

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is now possible to be invoked in the queue dispatcher thread by mqbc::StorageManager:

bmqt::GenericResult::Enum status = d_clusterData_p->cluster()->sendRequest(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ElectorInfo::setElectorInfo is "executed by the cluster DISPATCHER thread" so we need to put some warning about thread safety. We consider the access to d_leaderNode_p (and everything else here) thread-safe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comment about thread safety. Importantly, Cluster::sendRequest() has been invoked in queue dispatcher threads since before FSM, for example at RecoveryManager::onPartitionSyncStateQueryResponseDispatched().

if (bdls::ProcessUtil::getProcessName(&(identity.processName())) != 0) {
identity.processName() = "** UNKNOWN **";
bmqp_ctrlmsg::ClientIdentity identity(allocator);
if (!isRemote) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no need for proxy detailed identity?

Copy link
Collaborator Author

@kaikulimu kaikulimu Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at eb05973 where this change is made, you will see that the original code was:

, d_identity(cluster->isRemote()
                 ? ClusterDataIdentity(name,
                                       bmqp_ctrlmsg::ClientIdentity(allocator),
                                       allocator)
                 : clusterIdentity(name, d_membership.netCluster()))

Hence, if we are a remote node, an empty bmqp_ctrlmsg::ClientIdentity is passed in.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merely refactored and simplified the code.

@@ -785,6 +798,8 @@ void ClusterUtil::populateQueueAssignmentAdvisory(
BSLS_ASSERT_SAFE(domain);
BSLS_ASSERT_SAFE(clusterData->electorInfo().isSelfActiveLeader());

BALL_LOG_SET_CATEGORY(k_LOG_CATEGORY);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need it (considering BALL_LOG_SET_CLASS_CATEGORY("MQBC.CLUSTERUTIL");)?
And below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, these lines are redundant. Will remove them.

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>
Signed-off-by: Yuan Jing Vincent Yan <[email protected]>
Signed-off-by: Yuan Jing Vincent Yan <[email protected]>
@kaikulimu
Copy link
Collaborator Author

@dorjesinpo Ready for review again

@dorjesinpo dorjesinpo merged commit 1de0b4e into bloomberg:main Mar 26, 2024
10 checks passed
@kaikulimu kaikulimu deleted the isFSM branch April 10, 2024 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants