Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast Dedup: “flat” DDT entry format #15893

Closed
wants to merge 7 commits into from

Commits on Aug 15, 2024

  1. ddt: add FDT feature and support for legacy and new on-disk formats

    This is the supporting infrastructure for the upcoming dedup features.
    
    Traditionally, dedup objects live directly in the MOS root. While their
    details vary (checksum, type and class), they are all the same "kind" of
    thing - a store of dedup entries.
    
    The new features are more varied than that, and are better thought of as
    a set of related stores for the overall state of a dedup table.
    
    This adds a new feature flag, SPA_FEATURE_FAST_DEDUP. Enabling this will
    cause new DDTs to be created as a ZAP in the MOS root, named
    DDT-<checksum>. The is used as the root object for the normal type/class
    store objects, but will also be a place for any storage required by new
    features.
    
    This commit adds two new fields to ddt_t, for version and flags. These
    are intended to describe the structure and features of the overall dedup
    table, and are stored as-is in the DDT root. In this commit, flags are
    always zero, but the intent is that they can be used to hang optional
    logic or state onto for new dedup features. Version is always 1.
    
    For a "legacy" dedup table, where no DDT root directory exists, the
    version will be 0.
    
    ddt_configure() is expected to determine the version and flags features
    currently in operation based on whether or not the fast_dedup feature is
    enabled, and from what's available on disk. In this way, its possible to
    support both old and new tables.
    
    This also provides a migration path. A legacy setup can be upgraded to
    FDT by creating the DDT root ZAP, moving the existing objects into it,
    and setting version and flags appropriately. There's no support for that
    here, but it would be straightforward to add later and allows the
    possibility that newer features could be applied to existing dedup
    tables.
    
    Co-authored-by: Allan Jude <[email protected]>
    Signed-off-by: Rob Norris <[email protected]>
    Sponsored-by: Klara, Inc.
    Sponsored-by: iXsystems, Inc.
    robn and allanjude committed Aug 15, 2024
    Configuration menu
    Copy the full SHA
    4b2055e View commit details
    Browse the repository at this point in the history
  2. ZTS: tests for dedup legacy/FDT tables

    Very basic coverage to make sure things appear to work, have the right
    format on disk, and pool upgrades and mixed table types work as
    expected.
    
    Signed-off-by: Rob Norris <[email protected]>
    Sponsored-by: Klara, Inc.
    Sponsored-by: iXsystems, Inc.
    robn committed Aug 15, 2024
    Configuration menu
    Copy the full SHA
    e957dc8 View commit details
    Browse the repository at this point in the history
  3. zdb: rework DDT block count and leak check to just count the blocks

    The upcoming dedup features break the long held assumption that all
    blocks on disk with a 'D' dedup bit will always be present in the DDT,
    or will have the same set of DVA allocations on disk as in the DDT.
    
    If the DDT is no longer a complete picture of all the dedup blocks that
    will be and should be on disk, then it does us no good to walk and prime
    it up front, since it won't necessarily match up with every block we'll
    see anyway.
    
    Instead, we rework things here to be more like the BRT checks. When we
    see a dedup'd block, we look it up in the DDT, consume a refcount, and
    for the second-or-later instances, count them as duplicates.
    
    The DDT and BRT are moved ahead of the space accounting. This will
    become important for the "flat" feature, which may need to count a
    modified version of the block.
    
    Co-authored-by: Allan Jude <[email protected]>
    Co-authored-by: Don Brady <[email protected]>
    Signed-off-by: Rob Norris <[email protected]>
    Sponsored-by: Klara, Inc.
    Sponsored-by: iXsystems, Inc.
    3 people committed Aug 15, 2024
    Configuration menu
    Copy the full SHA
    c61047c View commit details
    Browse the repository at this point in the history
  4. ddt: rework access to phys array slots

    The "flat phys" feature will use only a single phys slot for all
    entries, which means the old "single", "double" etc naming now makes no
    sense, and more importantly, means that choosing the right slot for a
    given block pointer will depend on how many slots are in use for a given
    DDT.
    
    This removes the old names, and adds accessor macros to decouple
    specific phys array indexes from any particular meaning.
    
    (These macros look strange in isolation, mainly in the way they take the
    ddt_t* as an arg but don't use it. This is mostly a separate commit to
    introduce the concept to the reader before the "flat phys" commit
    extends it).
    
    Signed-off-by: Rob Norris <[email protected]>
    Sponsored-by: Klara, Inc.
    Sponsored-by: iXsystems, Inc.
    robn committed Aug 15, 2024
    Configuration menu
    Copy the full SHA
    3019efc View commit details
    Browse the repository at this point in the history
  5. ddt: introduce lightweight entry

    The idea here is that sometimes you need the contents of an entry with
    no intent to modify it, and/or from a place where its difficult to get
    hold of its originating ddt_t to know how to interpret it.
    
    A lightweight entry contains everything you might need to "read" an
    entry - its key, type and phys contents - but none of the extras for
    modifying it or using it in a larger context. It also has the full
    complement of phys slots, so it can represent any kind of dedup entry
    without having to know the specific configuration of the table it came
    from.
    
    Signed-off-by: Rob Norris <[email protected]>
    Sponsored-by: Klara, Inc.
    Sponsored-by: iXsystems, Inc.
    robn committed Aug 15, 2024
    Configuration menu
    Copy the full SHA
    955018b View commit details
    Browse the repository at this point in the history
  6. ddt: slim down ddt_entry_t

    This slims down the in-memory entry to as small as it can be. The
    IO-related parts are made into a separate entry, since they're
    relatively rarely needed.
    
    The variable allocation for dde_phys is to support the upcoming flat
    format.
    
    Signed-off-by: Rob Norris <[email protected]>
    Sponsored-by: Klara, Inc.
    Sponsored-by: iXsystems, Inc.
    robn committed Aug 15, 2024
    Configuration menu
    Copy the full SHA
    d15a5ef View commit details
    Browse the repository at this point in the history
  7. ddt: add "flat phys" feature

    Traditional dedup keeps a separate ddt_phys_t "type" for each possible
    count of DVAs (that is, copies=) parameter. Each of these are tracked
    independently of each other, and have their own set of DVAs. This leads
    to an (admittedly rare) situation where you can create as many as six
    copies of the data, by changing the copies= parameter between copying.
    This is both a waste of storage on disk, but also a waste of space in
    the stored DDT entries, since there never needs to be more than three
    DVAs to handle all possible values of copies=.
    
    This commit adds a new FDT feature, DDT_FLAG_FLAT. When active, only the
    first ddt_phys_t is used. Each time a block is written with the dedup
    bit set, this single phys is checked to see if it has enough DVAs to
    fulfill the request. If it does, the block is filled with the saved DVAs
    as normal. If not, an adjusted write is issued to create as many extra
    copies as are needed to fulfill the request, which are then saved into
    the entry too.
    
    Because a single phys is no longer an all-or-nothing, but can be
    transitioning from fewer to more DVAs, the write path now has to keep a
    copy of the previous "known good" DVA set so we can revert to it in case
    an error occurs. zio_ddt_write() has been restructured and heavily
    commented to make it much easier to see what's happening.
    
    Backwards compatibility is maintained simply by allocating four
    ddt_phys_t when the DDT_FLAG_FLAT flag is not set, and updating the phys
    selection macros to check the flag. In the old arrangement, each number
    of copies gets a whole phys, so it will always have either zero or all
    necessary DVAs filled, with no in-between, so the old behaviour
    naturally falls out of the new code.
    
    Signed-off-by: Rob Norris <[email protected]>
    Co-authored-by: Don Brady <[email protected]>
    Sponsored-by: Klara, Inc.
    Sponsored-by: iXsystems, Inc.
    robn and don-brady committed Aug 15, 2024
    Configuration menu
    Copy the full SHA
    eb0cb79 View commit details
    Browse the repository at this point in the history