Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi Node Cluster support for move-node #22

Merged
merged 25 commits into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
3978c83
multi-node cluster support for move-node
henokgetachew Oct 8, 2024
4490f1f
moves nodes one by one and adds tests
henokgetachew Oct 9, 2024
ee66add
Shorten long line
henokgetachew Oct 9, 2024
e6675b9
fix tests - remove docker-compose
henokgetachew Oct 9, 2024
7023db6
Merge branch 'main' into multi-node-move-node-support
henokgetachew Oct 9, 2024
771b399
Fix images
henokgetachew Oct 9, 2024
9955ea9
Make network external
henokgetachew Oct 9, 2024
0d56f06
Some more fixes with tests
henokgetachew Oct 10, 2024
c1feeaa
get-shard-mapping, e2e tests, other fixes
henokgetachew Oct 14, 2024
9d1f4e5
Fix unit test
henokgetachew Oct 14, 2024
e93e1df
More ... Fix unit tests
henokgetachew Oct 14, 2024
1766f03
Remove validation for move-node bin
henokgetachew Oct 14, 2024
2b5ac85
Sudo needed in linux
henokgetachew Oct 14, 2024
c5d2f71
fix docker compose command
henokgetachew Oct 14, 2024
8b11ba9
Tests and fixes
henokgetachew Oct 16, 2024
04cd4a3
remove unused var
henokgetachew Oct 16, 2024
2331843
Apply suggestions from code review
henokgetachew Oct 22, 2024
0b8128f
Fix code-move things around
henokgetachew Oct 22, 2024
69289a4
Handle inconsistent shard distribution per db
henokgetachew Oct 22, 2024
21693db
Use stdin and env var. e2e tests.
henokgetachew Oct 25, 2024
d7f45d6
update minor - new features built
henokgetachew Oct 25, 2024
c1cf973
Fix error message
henokgetachew Oct 25, 2024
e0c14bc
No standalone couch
henokgetachew Oct 25, 2024
611dacb
Changes after review
henokgetachew Oct 29, 2024
1fc91cd
Test on both couchdb2 and couchdb3
henokgetachew Oct 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions bin/get-shard-mapping.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/usr/bin/env node

const { getShardMapping } = require('../src/utils');

(async () => {
try {
const shardMapping = await getShardMapping();
console.log(JSON.stringify(shardMapping));
} catch (err) {
console.error('An unexpected error occurred', err);
process.exit(1);
}
})();
60 changes: 57 additions & 3 deletions bin/move-node.js
Original file line number Diff line number Diff line change
@@ -1,16 +1,70 @@
#!/usr/bin/env node
const [,, toNode] = process.argv;

const { moveNode, syncShards } = require('../src/move-node');
const { removeNode } = require('../src/remove-node');

const parseNodeMapping = (input) => {
if (!input) {
// Single node migration with unspecified node
return;
}
try {
// Multi-node migration with specified node mapping
// Example input: '{"oldNode":"newNode"}'
return JSON.parse(input);
} catch (err) {
// Single node migration with specified node
console.warn('Failed to parse node mapping JSON. Defaulting to single node mapping');
return input;
}
};

const getStdin = () => {
return new Promise((resolve) => {
dianabarsan marked this conversation as resolved.
Show resolved Hide resolved
if (process.stdin.isTTY) {
return resolve('');
}
let data = '';
process.stdin.setEncoding('utf8');
process.stdin.on('data', chunk => data += chunk);
process.stdin.on('end', () => resolve(data));
henokgetachew marked this conversation as resolved.
Show resolved Hide resolved
});
};

const parseShardMapJsonInput = async () => {
henokgetachew marked this conversation as resolved.
Show resolved Hide resolved
if (process.env.SHARD_MAPPING) {
try {
return JSON.parse(process.env.SHARD_MAPPING);
} catch (err) {
console.warn('Failed to parse SHARD_MAPPING environment variable');
}
}

try {
const input = await getStdin();
if (input.trim()) {
return JSON.parse(input.trim());
}
} catch (err) {
console.warn('Failed to parse shard mapping JSON from stdin');
}
return;
};

(async () => {
try {
const removedNodes = await moveNode();
for (const node of removedNodes) {
const parsedToNode = parseNodeMapping(toNode);
const shardMapJson = await parseShardMapJsonInput();

const movedNodes = await moveNode(parsedToNode, shardMapJson);
console.log('Node moved successfully');
henokgetachew marked this conversation as resolved.
Show resolved Hide resolved

for (const node of movedNodes) {
await removeNode(node);
}
await syncShards();
console.log('Node moved successfully');
console.log('Removed nodes:', movedNodes);
} catch (err) {
console.error('An unexpected error occurred', err);
process.exit(1);
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ version: '3.9'

services:
couch-migration:
image: public.ecr.aws/medic/couchdb-migration:1.0.3
image: public.ecr.aws/medic/couchdb-migration:1.1.0
networks:
- cht-net
environment:
Expand Down
5 changes: 3 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "couchdb-migration",
"version": "1.0.3",
"version": "1.1.0",
"description": "Tool that interfaces with CouchDB to help migrating data within a cluster.",
"scripts": {
"eslint": "eslint --color --cache ./src ./test",
Expand All @@ -22,7 +22,8 @@
"remove-node": "bin/remove-node.js",
"check-couchdb-up": "bin/check-couchdb-up.js",
"pre-index-views": "bin/pre-index-views.js",
"verify": "bin/verify.js"
"verify": "bin/verify.js",
"get-shard-mapping": "bin/get-shard-mapping.js"
},
"repository": {
"type": "git",
Expand Down
6 changes: 3 additions & 3 deletions src/check-couch-up.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
const utils = require('./utils');
const NUM_RETRY = 200;
const NUM_RETRY = 300;
henokgetachew marked this conversation as resolved.
Show resolved Hide resolved
const TIMEOUT_RETRY = 1000; // 1 second

const isCouchUp = async () => {
Expand Down Expand Up @@ -33,14 +33,14 @@ const repeatRetry = async (promiseFn) => {
const checkCouchUp = async () => {
const isUp = await repeatRetry(isCouchUp);
if (!isUp) {
throw new Error('CouchDb is not up after 100 seconds.');
throw new Error('CouchDb is not up after 300 seconds.');
}
};

const checkClusterReady = async (nbrNodes) => {
const isReady = await repeatRetry(isClusterReady.bind({}, nbrNodes));
if (!isReady) {
throw new Error('CouchDb Cluster is not ready after 100 seconds.');
throw new Error('CouchDb Cluster is not ready after 300 seconds.');
}
};

Expand Down
75 changes: 70 additions & 5 deletions src/move-node.js
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
const utils = require('./utils');
const moveShard = require('./move-shard');

const moveNode = async (toNode) => {
const replaceForSingleNode = async (toNode) => {
const removedNodes = [];

if (!toNode) {
const nodes = await utils.getNodes();
if (nodes.length > 1) {
throw new Error('More than one node found.');
throw new Error('More than one node found. Please specify a node mapping in the format {"oldNode":"newNode"}.');
}
toNode = nodes[0];
}
Expand All @@ -20,10 +20,75 @@ const moveNode = async (toNode) => {
return [...new Set(removedNodes)];
};

const replaceInCluster = async (nodeMap, shardMapJson) => {
const removedNodes = [];
if (!shardMapJson) {
throw new Error('Shard map JSON is required for multi-node migration');
}

const [oldNode, newNode] = Object.entries(nodeMap)[0];
console.log(`Migrating from ${oldNode} to ${newNode}`);

// Use the provided shard map
const currentDistribution = shardMapJson;

// For each shard range in the current distribution
console.log('Current distribution:', currentDistribution);
for (const [shardRange, dbNodes] of Object.entries(currentDistribution)) {
console.log('Shard range:', shardRange);
console.log('DB nodes:', dbNodes);
henokgetachew marked this conversation as resolved.
Show resolved Hide resolved
// dbNodes is an object mapping db names to node names
for (const [dbName, nodeName] of Object.entries(dbNodes)) {
console.log('DB name:', dbName);
console.log('Node name:', nodeName);
henokgetachew marked this conversation as resolved.
Show resolved Hide resolved
if (nodeName === oldNode) {
console.log(
`Moving shard ${shardRange} for db ${dbName} from ${oldNode} to ${newNode}`
);
const oldNodes = await moveShard.moveShard(shardRange, newNode, dbName);
removedNodes.push(...oldNodes);
henokgetachew marked this conversation as resolved.
Show resolved Hide resolved
}
}
}
if (!removedNodes.includes(oldNode)) {
henokgetachew marked this conversation as resolved.
Show resolved Hide resolved
removedNodes.push(oldNode);
}
return [...new Set(removedNodes)];
};

const moveNode = async (nodeMap, shardMapJson) => {
if (typeof nodeMap === 'object') {
// Multi-node migration - replace node in clustered couch
return await replaceInCluster(nodeMap, shardMapJson);
}

// Single node migration - replace node in a single node couch
return await replaceForSingleNode(nodeMap);
};

const syncShards = async () => {
const allDbs = await utils.getDbs();
for (const db of allDbs) {
await utils.syncShards(db);
try {
const membership = await utils.getMembership();
const { all_nodes, cluster_nodes } = membership;

const clusterComplete =
all_nodes.length === cluster_nodes.length &&
all_nodes.every((node) => cluster_nodes.includes(node));

if (clusterComplete) {
henokgetachew marked this conversation as resolved.
Show resolved Hide resolved
const allDbs = await utils.getDbs();
for (const db of allDbs) {
await utils.syncShards(db);
}
console.log('Shards synchronized.');
} else {
console.log(
'Multi-node migration detected. Shard synchronization will be run once all nodes have been migrated.'
);
}
} catch (err) {
console.error('Error during shard synchronization:', err);
throw err;
}
};

Expand Down
4 changes: 2 additions & 2 deletions src/move-shard.js
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,11 @@ const updateDbMetadata = (metadata, shard, toNode) => {
return fromNodes;
};

const moveShard = async (shard, toNode) => {
const moveShard = async (shard, toNode, dbName = null) => {
await validateCall(shard, toNode);
const removedNodes = [];

const dbs = await utils.getDbs();
const dbs = dbName ? [dbName] : await utils.getDbs();
for (const dbName of dbs) {
const metadata = await utils.getDbMetadata(dbName);
const oldNodes = updateDbMetadata(metadata, shard, toNode);
Expand Down
38 changes: 38 additions & 0 deletions src/utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,43 @@ const getShards = async () => {
}
};

/*
This function creates a mapping of shard ranges to nodes and databases
*/
const getShardMapping = async () => {
try {
const allDbs = await getDbs();
const shardMap = {};

for (const db of allDbs) {
const url = await getUrl(`${db}/_shards`);
const shardInfo = await request({ url });

for (const [shardRange, nodeList] of Object.entries(shardInfo.shards)) {
// In n=1 setup, there should be only one node per shard range.
// We will have to revisit this if we ever support n>1.
if (nodeList.length !== 1) {
console.warn(`Unexpected number of nodes for range ${shardRange} in db ${db}: ${nodeList.length}`);
}
const node = nodeList[0];

// Initialize the shard range in the shardMap if not present
if (!shardMap[shardRange]) {
shardMap[shardRange] = {};
}

// Map the database to the node for the shard range
shardMap[shardRange][db] = node;
}
}

return shardMap;
} catch (err) {
console.error('Error getting shard mapping:', err);
throw new Error('Failed to get shard mapping');
}
};

const getNodes = async () => {
const membership = await getMembership();
return membership.all_nodes;
Expand Down Expand Up @@ -257,6 +294,7 @@ module.exports = {
deleteNode,
syncShards,
getShards,
getShardMapping,
getNodes,
getConfig,
getCouchUrl,
Expand Down
4 changes: 4 additions & 0 deletions src/verify.js
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
const utils = require('./utils');
const DBS_TO_IGNORE = ['_global_changes', '_replicator'];

const verifyViews = async (dbName, numDocs) => {
if (!numDocs) {
Expand Down Expand Up @@ -29,6 +30,9 @@ const verifyViews = async (dbName, numDocs) => {
};

const verifyDb = async (dbName) => {
if (DBS_TO_IGNORE.includes(dbName)) {
return;
}
console.info(`Verifying ${dbName}`);
await utils.syncShards(dbName);

Expand Down
1 change: 1 addition & 0 deletions test/docker-compose-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ services:
networks:
cht-net:
name: ${CHT_NETWORK:-cht-net}
external: true
26 changes: 13 additions & 13 deletions test/e2e/multi-node-3.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,37 +29,37 @@ docker rm -f -v scripts-couchdb-1.local-1 scripts-couchdb-2.local-1 scripts-couc
# create docker network
docker network create $CHT_NETWORK || true
# build service image
docker-compose -f ../docker-compose-test.yml up --build
docker compose -f ../docker-compose-test.yml up --build

# launch vanilla couch, populate with some data
docker-compose -f ./scripts/couchdb-vanilla.yml up -d
docker-compose -f ../docker-compose-test.yml run couch-migration check-couchdb-up
docker compose -f ./scripts/couchdb-vanilla.yml up -d
docker compose -f ../docker-compose-test.yml run couch-migration check-couchdb-up
node ./scripts/generate-documents $jsondataddir
# pre-index 4.0.1 views
docker-compose -f ../docker-compose-test.yml run couch-migration pre-index-views 4.4.0
docker compose -f ../docker-compose-test.yml run couch-migration pre-index-views 4.4.0
sleep 5 # this is needed, CouchDb runs fsync with a 5 second delay
# export env for cht 4.x couch
export $(docker-compose -f ../docker-compose-test.yml run couch-migration get-env | xargs)
docker-compose -f ./scripts/couchdb-vanilla.yml down --remove-orphans --volumes
export $(docker compose -f ../docker-compose-test.yml run couch-migration get-env | xargs)
docker compose -f ./scripts/couchdb-vanilla.yml down --remove-orphans --volumes

# launch cht 4.x CouchDb cluster
docker-compose -f ./scripts/couchdb3-cluster.yml up -d
docker-compose -f ../docker-compose-test.yml run couch-migration check-couchdb-up 3
docker compose -f ./scripts/couchdb3-cluster.yml up -d
docker compose -f ../docker-compose-test.yml run couch-migration check-couchdb-up 3

# generate shard matrix
# this is an object that assigns every shard to one of the nodes
shard_matrix=$(docker-compose -f ../docker-compose-test.yml run couch-migration generate-shard-distribution-matrix)
shard_matrix=$(docker compose -f ../docker-compose-test.yml run couch-migration generate-shard-distribution-matrix)
file_matrix="{\"[email protected]\":\"$couch1dir\",\"[email protected]\":\"$couch2dir\",\"[email protected]\":\"$couch3dir\"}"
echo $shard_matrix
echo $file_matrix
# moves shard data files to their corresponding nodes, according to the matrix
docker-compose -f ../docker-compose-test.yml run couch-migration shard-move-instructions $shard_matrix
docker compose -f ../docker-compose-test.yml run couch-migration shard-move-instructions $shard_matrix
node ./scripts/distribute-shards.js $shard_matrix $file_matrix
# change database metadata to match the shard physical locations
docker-compose -f ../docker-compose-test.yml run couch-migration move-shards $shard_matrix
docker-compose -f ../docker-compose-test.yml run couch-migration verify
docker compose -f ../docker-compose-test.yml run couch-migration move-shards $shard_matrix
docker compose -f ../docker-compose-test.yml run couch-migration verify
# test that data exists, database shard maps are correct and view indexes are preserved
node ./scripts/assert-dbs.js $jsondataddir $shard_matrix

docker-compose -f ./scripts/couchdb-cluster.yml down --remove-orphans --volumes
docker compose -f ./scripts/couchdb-cluster.yml down --remove-orphans --volumes
henokgetachew marked this conversation as resolved.
Show resolved Hide resolved

Loading
Loading