From 2ff9965e02756636f8d33fd96b7c909f2fca7414 Mon Sep 17 00:00:00 2001 From: Zac Dover Date: Tue, 21 Mar 2023 22:27:15 +1000 Subject: [PATCH] doc/rados: line-edit erasure-code.rst Line-edit doc/rados/operations/erasure-code.rst. Signed-off-by: Zac Dover --- doc/rados/operations/erasure-code.rst | 158 +++++++++++++------------- 1 file changed, 82 insertions(+), 76 deletions(-) diff --git a/doc/rados/operations/erasure-code.rst b/doc/rados/operations/erasure-code.rst index e4e404574f6ae..d6321af3cacab 100644 --- a/doc/rados/operations/erasure-code.rst +++ b/doc/rados/operations/erasure-code.rst @@ -1,14 +1,14 @@ .. _ecpool: -============= +============== Erasure code -============= +============== By default, Ceph `pools <../pools>`_ are created with the type "replicated". In -replicated-type pools, every object is copied to multiple disks (this -multiple copying is the "replication"). +replicated-type pools, every object is copied to multiple disks. This +multiple copying is the method of data protection known as "replication". -In contrast, `erasure-coded `_ +By contrast, `erasure-coded `_ pools use a method of data protection that is different from replication. In erasure coding, data is broken into fragments of two kinds: data blocks and parity blocks. If a drive fails or becomes corrupted, the parity blocks are @@ -23,10 +23,10 @@ first forward error correction code was developed in 1950 by Richard Hamming at Bell Laboratories. -Creating a sample erasure coded pool +Creating a sample erasure-coded pool ------------------------------------ -The simplest erasure coded pool is equivalent to `RAID5 +The simplest erasure-coded pool is similar to `RAID5 `_ and requires at least three hosts: @@ -47,12 +47,13 @@ requires at least three hosts: ABCDEFGHI -Erasure code profiles +Erasure-code profiles --------------------- -The default erasure code profile can sustain the loss of two OSDs. This erasure -code profile is equivalent to a replicated pool of size three, but requires -2TB to store 1TB of data instead of 3TB to store 1TB of data. The default +The default erasure-code profile can sustain the overlapping loss of two OSDs +without losing data. This erasure-code profile is equivalent to a replicated +pool of size three, but with different storage requirements: instead of +requiring 3TB to store 1TB, it requires only 2TB to store 1TB. The default profile can be displayed with this command: .. prompt:: bash $ @@ -68,26 +69,27 @@ profile can be displayed with this command: technique=reed_sol_van .. note:: - The default erasure-coded pool, the profile of which is displayed here, is - not the same as the simplest erasure-coded pool. - - The default erasure-coded pool has two data chunks (k) and two coding chunks - (m). The profile of the default erasure-coded pool is "k=2 m=2". + The profile just displayed is for the *default* erasure-coded pool, not the + *simplest* erasure-coded pool. These two pools are not the same: + + The default erasure-coded pool has two data chunks (K) and two coding chunks + (M). The profile of the default erasure-coded pool is "k=2 m=2". - The simplest erasure-coded pool has two data chunks (k) and one coding chunk - (m). The profile of the simplest erasure-coded pool is "k=2 m=1". + The simplest erasure-coded pool has two data chunks (K) and one coding chunk + (M). The profile of the simplest erasure-coded pool is "k=2 m=1". Choosing the right profile is important because the profile cannot be modified after the pool is created. If you find that you need an erasure-coded pool with a profile different than the one you have created, you must create a new pool -with a different (and presumably more carefully-considered) profile. When the -new pool is created, all objects from the wrongly-configured pool must be moved -to the newly-created pool. There is no way to alter the profile of a pool after its creation. +with a different (and presumably more carefully considered) profile. When the +new pool is created, all objects from the wrongly configured pool must be moved +to the newly created pool. There is no way to alter the profile of a pool after +the pool has been created. -The most important parameters of the profile are *K*, *M* and +The most important parameters of the profile are *K*, *M*, and *crush-failure-domain* because they define the storage overhead and the data durability. For example, if the desired architecture must -sustain the loss of two racks with a storage overhead of 67% overhead, +sustain the loss of two racks with a storage overhead of 67%, the following profile can be defined: .. prompt:: bash $ @@ -106,7 +108,7 @@ the following profile can be defined: The *NYAN* object will be divided in three (*K=3*) and two additional *chunks* will be created (*M=2*). The value of *M* defines how many -OSD can be lost simultaneously without losing any data. The +OSDs can be lost simultaneously without losing any data. The *crush-failure-domain=rack* will create a CRUSH rule that ensures no two *chunks* are stored in the same rack. @@ -155,19 +157,19 @@ no two *chunks* are stored in the same rack. +------+ -More information can be found in the `erasure code profiles +More information can be found in the `erasure-code profiles <../erasure-code-profile>`_ documentation. Erasure Coding with Overwrites ------------------------------ -By default, erasure coded pools only work with uses like RGW that -perform full object writes and appends. +By default, erasure-coded pools work only with operations that +perform full object writes and appends (for example, RGW). -Since Luminous, partial writes for an erasure coded pool may be +Since Luminous, partial writes for an erasure-coded pool may be enabled with a per-pool setting. This lets RBD and CephFS store their -data in an erasure coded pool: +data in an erasure-coded pool: .. prompt:: bash $ @@ -175,31 +177,33 @@ data in an erasure coded pool: This can be enabled only on a pool residing on BlueStore OSDs, since BlueStore's checksumming is used during deep scrubs to detect bitrot -or other corruption. In addition to being unsafe, using Filestore with -EC overwrites results in lower performance compared to BlueStore. +or other corruption. Using Filestore with EC overwrites is not only +unsafe, but it also results in lower performance compared to BlueStore. -Erasure coded pools do not support omap, so to use them with RBD and -CephFS you must instruct them to store their data in an EC pool, and +Erasure-coded pools do not support omap, so to use them with RBD and +CephFS you must instruct them to store their data in an EC pool and their metadata in a replicated pool. For RBD, this means using the -erasure coded pool as the ``--data-pool`` during image creation: +erasure-coded pool as the ``--data-pool`` during image creation: .. prompt:: bash $ rbd create --size 1G --data-pool ec_pool replicated_pool/image_name -For CephFS, an erasure coded pool can be set as the default data pool during +For CephFS, an erasure-coded pool can be set as the default data pool during file system creation or via `file layouts <../../../cephfs/file-layouts>`_. -Erasure coded pool and cache tiering ------------------------------------- +Erasure-coded pools and cache tiering +------------------------------------- -Erasure coded pools require more resources than replicated pools and -lack some functionality such as omap. To overcome these -limitations, one can set up a `cache tier <../cache-tiering>`_ -before the erasure coded pool. +Erasure-coded pools require more resources than replicated pools and +lack some of the functionality supported by replicated pools (for example, omap). +To overcome these limitations, one can set up a `cache tier <../cache-tiering>`_ +before setting up the erasure-coded pool. -For instance, if the pool *hot-storage* is made of fast storage: +For example, if the pool *hot-storage* is made of fast storage, the following commands +will place the *hot-storage* pool as a tier of *ecpool* in *writeback* +mode: .. prompt:: bash $ @@ -207,58 +211,60 @@ For instance, if the pool *hot-storage* is made of fast storage: ceph osd tier cache-mode hot-storage writeback ceph osd tier set-overlay ecpool hot-storage -will place the *hot-storage* pool as tier of *ecpool* in *writeback* -mode so that every write and read to the *ecpool* are actually using -the *hot-storage* and benefit from its flexibility and speed. +The result is that every write and read to the *ecpool* actually uses +the *hot-storage* pool and benefits from its flexibility and speed. More information can be found in the `cache tiering -<../cache-tiering>`_ documentation. Note however that cache tiering +<../cache-tiering>`_ documentation. Note, however, that cache tiering is deprecated and may be removed completely in a future release. -Erasure coded pool recovery +Erasure-coded pool recovery --------------------------- -If an erasure coded pool loses some data shards, it must recover them from others. -This involves reading from the remaining shards, reconstructing the data, and +If an erasure-coded pool loses any data shards, it must recover them from others. +This recovery involves reading from the remaining shards, reconstructing the data, and writing new shards. + In Octopus and later releases, erasure-coded pools can recover as long as there are at least *K* shards available. (With fewer than *K* shards, you have actually lost data!) -Prior to Octopus, erasure coded pools required at least ``min_size`` shards to be -available, even if ``min_size`` is greater than ``K``. We recommend ``min_size`` -be ``K+2`` or more to prevent loss of writes and data. -This conservative decision was made out of an abundance of caution when -designing the new pool mode. As a result pools with lost OSDs but without -complete loss of any data were unable to recover and go active -without manual intervention to temporarily change the ``min_size`` setting. +Prior to Octopus, erasure-coded pools required that at least ``min_size`` shards be +available, even if ``min_size`` was greater than ``K``. This was a conservative +decision made out of an abundance of caution when designing the new pool +mode. As a result, however, pools with lost OSDs but without complete data loss were +unable to recover and go active without manual intervention to temporarily change +the ``min_size`` setting. + +We recommend that ``min_size`` be ``K+2`` or greater to prevent loss of writes and +loss of data. + + Glossary -------- *chunk* - when the encoding function is called, it returns chunks of the same - size. Data chunks which can be concatenated to reconstruct the original - object and coding chunks which can be used to rebuild a lost chunk. + When the encoding function is called, it returns chunks of the same size as each other. There are two + kinds of chunks: (1) *data chunks*, which can be concatenated to reconstruct the original object, and + (2) *coding chunks*, which can be used to rebuild a lost chunk. *K* - the number of data *chunks*, i.e. the number of *chunks* in which the - original object is divided. For instance if *K* = 2 a 10KB object - will be divided into *K* objects of 5KB each. + The number of data chunks into which an object is divided. For example, if *K* = 2, then a 10KB object + is divided into two objects of 5KB each. *M* - the number of coding *chunks*, i.e. the number of additional *chunks* - computed by the encoding functions. If there are 2 coding *chunks*, - it means 2 OSDs can be out without losing data. - + The number of coding chunks computed by the encoding function. *M* is equal to the number of OSDs that can + be missing from the cluster without the cluster suffering data loss. For example, if there are two coding + chunks, then two OSDs can be missing without data loss. -Table of content ----------------- +Table of contents +----------------- .. toctree:: - :maxdepth: 1 - - erasure-code-profile - erasure-code-jerasure - erasure-code-isa - erasure-code-lrc - erasure-code-shec - erasure-code-clay + :maxdepth: 1 + + erasure-code-profile + erasure-code-jerasure + erasure-code-isa + erasure-code-lrc + erasure-code-shec + erasure-code-clay