Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Error Correction Code #235

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JeromeMartinez
Copy link
Member

Error Correction Code feature, file can be corrected if slightly damaged.
This permits e.g. to avoid a complete retransmission if a file is slightly damaged during transport, or we can retrieve the complete file even if we have several damaged blocks in our LTO tape.

The principle (numbers are the ones by default, they will be tweakable):

  • a (file) shard is 1 MiB long
  • a shard is considered damaged if 1 to 1 MiB bytes are wrong
  • every 248 data shards (so 248 MiB), 8 parity shards are encoded, as well as their corresponding hash (so 256 shard hashes)
  • parity shards and shard hashes are appended to the end of the file, after A/V data (note: we could put it in a sidecar file too if someone needs that)
  • for every 248 MiB, if less than 8 shards are damaged, we can fix the file!
  • Keep maths simple and good performance for the moment, we use the idea from Backblaze, limiting to erasure (a block is considered completely bad even if 1 byte is wrong) for the moment.

Reasons for default choices (noted 248x8x1M):

  • shard size of 1 MiB : a common damage is a couple of consecutive disk sectors (512 bytes or 4 KiB) or 1 or a couple of consecutive LTO blocks (64 KiB or 256 KiB, maybe 1.55 MiB?), trying to handle the bad scenario of a couple of consecutive bad sectors/blocks.
  • 256 data + parity shards: the maximum with the algorithm used on 8-bit bytes
  • 8 parity shards: trying to find balance between performance (+15% time when testing FFV1 at the same time), overhead (+3% file size), and correction ability (up to 8 bad shards every 248 shards).

So comparison:

  • with raw content, if you lose some bytes, you lose some equivalent content but you don't know where exactly
  • with compressed content + error correction code, you store more content on each LTO and if you lose some bytes you can retrieve them so the impact of compression (losing a complete slice if you lose some bytes) is mitigated while keeping the nearly the same compression ratio.

Commands:

  • rawcooked --ecc YourDirectoryName (also included in rawcooked --all YourDirectoryName) for adding error correction codes to the file
  • rawcooked YourFileName.mkv fails if file is damaged.
  • rawcooked --fix YourFileName.mkv for fixing it (need write rights)

Still some stuff to do before merging IMO:

  • options for settings
  • as the Track element, containing e.g. FFV1 init bytes, is more important than other bytes, we could also copy it at the end of the file beside copies of Error Correction Code init; same for Cues element (permitting a sync if we lose content in the file without having been able to see that the content is completely cut i.e. file offsets are changed)
  • Not yet handling (file rejected) when first 59 bytes or last 1 MiB are corrupted (not due to format design, just need more code for handling such cases)
  • Documentation of the format

At long term:

  • better interface, e.g. if the file is corrupted you can't decode it at all (file rejected) without fixing it (write rights) first, we could permit decode even if the file is in read-only
  • I have a proof of concept of AVX-512 enabled Error Correction Code encoder, performance is multiplied by ~20 when AVX-512 is available (and by ~12 when AVX2 is available, so on nearly all CPU sold in the last few years), so performance impact of the feature would be minor

Comments?

@digitensions
Copy link
Contributor

This looks amazing Jérôme! I'd be happy to help with some testing if you need it. Something to watch develop for MACE's RAWcooked collection @alexhabgood

Base automatically changed from master to main March 22, 2021 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants