-
Notifications
You must be signed in to change notification settings - Fork 97
Migrating Sharded Cluster Offline All Data
This guide explains how to do an offline data migration from MongoDB to TokuMX, for a sharded cluster, converting the existing data in MongoDB to TokuMX, converting the entire data set at once.
This allows you to migrate from a MongoDB cluster with some number of shards to a TokuMX cluster with a different number of shards, or to just start out with a more even balance of chunks, if the MongoDB cluster was unevenly balanced.
This guide can also be used to migrate from a MongoDB sharded cluster to a single TokuMX replica set. Since TokuMX already has document-level concurrency, high write throughput, and compression, it is reasonable to expect a single TokuMX replica set to support the workload of a sharded cluster of MongoDB instances, and migrating down to a single replica set reduces the operations work for maintaining the TokuMX database.
For other migration strategies, start with Migrating from MongoDB and Migrating from MongoDB: Sharded clusters.
-
Stop your application and turn off the balancer, and wait for any migrations to complete. The cluster should be idle.
> sh.setBalancerState(false); > while (sh.isBalancerRunning()) { ... print('waiting...'); ... sleep(1000); ... }
-
Back up each database in the data set by using
mongodump
to connect to one of themongos
routers. Do not back up theconfig
database.For each database
$db
:$ mongodump --host mongos0.domain --db $db --out /var/lib/mongodb-$db.backup
-
Uninstall MongoDB from all shard servers and router servers. You can also remove the old
dbpath
from all shards since you have a backup of the data. Keep the config servers running for reference. -
Set up an empty TokuMX sharded cluster. Deploying a sharded cluster in TokuMX is the same as it is in MongoDB.
-
For each database that was sharded, enable sharding for that database and then enable sharding for each collection.
It is a good idea to pre-split all sharded collections so that the import phase loads data evenly. You can use the MongoDB config database as a guideline, or just pre-split manually. After pre-splitting, allow the balancer to finish moving chunks while they are empty.
-
Import the backup of all the non-config databases with
mongorestore
through the TokuMXmongos
routers.For each database
$db
:$ mongorestore /var/lib/mongodb-$db.backup
-
Shut down the MongoDB config servers, and remove the
dbpath
for them.
With an offline migration through the mongos
router, the TokuMX bulk
loader is not used, so the import phase will be slower than importing data
directly into mongod
. This slowness can be mitigated by using multiple
concurrent import clients. The work can be divided on a per-db or
per-collection basis.
Furthermore, it is possible to take offline backups of each shard individually, and then import them into a running empty TokuMX sharded cluster in parallel.