-
Notifications
You must be signed in to change notification settings - Fork 97
Migrating Sharded Clusters
When migrating a sharded cluster to TokuMX, there are three main strategies to consider.
- Migrating from MongoDB: Sharded cluster (offline, individual shards)
- Migrating from MongoDB: Sharded cluster (offline, all data at once)
- Migrating from MongoDB: Sharded cluster (online)
Just like for replica sets, sharded clusters can be migrated "offline" or "online". See Migrating from MongoDB: Replica set for the differences between offline and online migrations. They apply to sharded clusters as well.
When migrating a sharded cluster offline, it is possible to convert each shard individually, or to convert the entire data set at once.
When converting each shard individually, you will end up with the same number of shards and chunk distribution that you had in MongoDB. The benefits here are:
- The migration process is faster because each shard can be migrated in parallel, and all shards can use the bulk loader, which imports data faster.
- Each shard is backed up to disk separately, so the space used by the backup is spread over many disks.
When converting the entire data set at once, you have the opportunity to change the number of shards and the chunk distribution, but this method does not use TokuMX’s fast bulk loader, so will typically be a slower process. Converting the entire data set at once also can require a lot of extra disk space because the entire data set needs to be dumped to disk at once.
When the entire data set is dumped to disk in one backup, it is also possible to import this backup into a single TokuMX replica set. Since TokuMX already has document-level concurrency, high write throughput, and compression, it is reasonable to expect a single TokuMX replica set to support the workload of a sharded cluster of MongoDB instances, and migrating down to a single replica set reduces the operations work for maintaining the TokuMX database.