Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi Node Cluster support for move-node #22

Merged
merged 25 commits into from
Oct 30, 2024

Conversation

henokgetachew
Copy link
Contributor

@henokgetachew henokgetachew commented Oct 8, 2024

This is an attempt at adding a multi-node support for move-nodes command.

  1. First retrieve the shard mapping from the original instance:
    shard_mapping=$(get-shard-mapping)
  2. Then run (Against the target node):

pipe in the shard_mapping:

echo '$shard_mapping' | move-node '{"couchdb@old-node-1":"couchdb@new-node-1"}'

The above commands can all be run via docker compose in line with the other commands in this repo.

Copy link
Member

@dianabarsan dianabarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool.
I think move-node should only move one node at a time, not more.
Also, we need to add a really strong e2e tests, even one that covers mixed shards (with all nodes having one shard for one db).

bin/move-node.js Outdated Show resolved Hide resolved
@henokgetachew
Copy link
Contributor Author

henokgetachew commented Oct 14, 2024

@dianabarsan can you have another look?

Copy link
Member

@dianabarsan dianabarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! I left a few questions and requests inline.

bin/move-node.js Outdated Show resolved Hide resolved
bin/move-node.js Outdated Show resolved Hide resolved
bin/move-node.js Outdated Show resolved Hide resolved
bin/move-node.js Outdated Show resolved Hide resolved
bin/move-node.js Show resolved Hide resolved
docker-compose.yml Outdated Show resolved Hide resolved
src/get-shard-mapping.js Outdated Show resolved Hide resolved
src/move-node.js Outdated Show resolved Hide resolved
src/move-node.js Outdated
Comment on lines 23 to 25
if (!removedNodes.includes(oldNode)) {
removedNodes.push(oldNode);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this happen? As far as I can tell, only way this would happen is if the node had no shards allocated. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. I think I ran into it while doing some tests. I suspect that it happened if you used move-shards separately and had moved all shards off of the node but without removing the node. This function would still make sure it gets removed even if the shards had been moved previously - something along those lines.

@henokgetachew
Copy link
Contributor Author

Pushed the changes and unit tests @dianabarsan. Let me know if things make sense.

Copy link
Member

@dianabarsan dianabarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the changes!! I left a few comments below. This is getting together quite beautifully :)

bin/move-node.js Outdated Show resolved Hide resolved
src/check-couch-up.js Show resolved Hide resolved
src/get-shard-mapping.js Outdated Show resolved Hide resolved
bin/get-shard-mapping.js Outdated Show resolved Hide resolved
src/get-shard-mapping.js Outdated Show resolved Hide resolved
src/move-node.js Outdated Show resolved Hide resolved
src/move-node.js Outdated Show resolved Hide resolved
test/e2e/scripts/assert-dbs.js Show resolved Hide resolved
docker-compose -f ./scripts/couchdb-cluster.yml down --remove-orphans --volumes
# Let's get the shard mapping - we need this to migrate to a different cluster.
shard_mapping=$(docker compose -f ../docker-compose-test.yml run couch-migration get-shard-mapping)
echo $shard_mapping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I've suggested we don't actually pass this as a parameter to move-node. Does this work if we change the actions to first move the nodes under the old install and then start up the new install?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually want to avoid running any commands with side-effects on the old installs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this would actually be on the cloned data of the old install, so it doesn't matter. It's the data that we are preparing for the new install, whether we run an action when old couch is running vs new couch is running, it should be the same.

expect(utils.getDbs.calledOnce).to.be.true;
expect(utils.getUrl.calledThrice).to.be.true;
expect(utils.request.calledThrice).to.be.true;
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a unit test where a shard for one db is on one node, while for the other db the same shard is on another node? It'd be cool to see what happens in real life when this is the case. I've seen this occur very frequently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This was enlightening. While digging into this, I investigated several clusters and it indeed turns out that the mapping for different databases are different. Some use 1, 2, 3 while others might use 2, 3, 1 and so forth. I've also confirmed that this is expected behavior for couch. One of the determinants for this is the timing of the database creation. Therefore this has necessitated changing the mapping to include database information - as there is no concept of a global shard map for all databases in the couch server. I have added the tests to confirm that the new approach works for this scenario. It's actually the only approach that works if you want to preserve shard mapping across migrations.

@henokgetachew
Copy link
Contributor Author

@dianabarsan welcome back! Can you take another look here please?

@dianabarsan dianabarsan self-requested a review October 28, 2024 07:51
Copy link
Member

@dianabarsan dianabarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent progress! I added some comment about being very intentional with logical choices, instead of reactive.

bin/move-node.js Outdated Show resolved Hide resolved
bin/move-node.js Show resolved Hide resolved
src/move-node.js Outdated Show resolved Hide resolved
src/move-node.js Outdated Show resolved Hide resolved
src/move-node.js Outdated Show resolved Hide resolved
src/move-node.js Outdated Show resolved Hide resolved
src/move-node.js Show resolved Hide resolved
test/e2e/scripts/compare-shard-mappings.js Outdated Show resolved Hide resolved
test/e2e/multi-node.sh Outdated Show resolved Hide resolved
@henokgetachew
Copy link
Contributor Author

@dianabarsan can you have another look here please?

Copy link
Member

@dianabarsan dianabarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super cool!
Just one pending request, apologies I didn't catch this one earlier :(

bin/move-node.js Show resolved Hide resolved
test/e2e/multi-node-3.sh Outdated Show resolved Hide resolved
@henokgetachew
Copy link
Contributor Author

Changes pushed.

Copy link
Member

@dianabarsan dianabarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌟 Excellent!

@henokgetachew henokgetachew merged commit ffcb620 into main Oct 30, 2024
4 checks passed
@henokgetachew henokgetachew deleted the multi-node-move-node-support branch October 30, 2024 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants