Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content Adressability and File Segmenting #17

Open
cryptoquick opened this issue Dec 22, 2022 · 0 comments
Open

Content Adressability and File Segmenting #17

cryptoquick opened this issue Dec 22, 2022 · 0 comments

Comments

@cryptoquick
Copy link
Member

cryptoquick commented Dec 22, 2022

Files over 16MB will be segmented in order to improve computational parallelization and to support streaming very large files.

Segments are different than chunks in that there will always need to be 4/8 chunks, but there can be many segment increments of 16MB.

In order to support parallelization, a content catalog is needed in order to refer to the original content that was encoded. This content catalog will be storage frontend-specific. For BitTorrent it'll be a SHA-2 hash, for IPFS it'll be a Blake2b Multihash, and for the HTTP frontend, it'll use a Blake3 hash. In all cases, the client is encouraged to hash the contents received once-over in order to verify it has indeed received the correct data. Content catalogs will be Carbonado-encoded on-disk, with optional encryption in order to preserve privacy at-rest.

For each frontend supported, a YAML file is used to simplify inspection, and it will contain a list of segments indexed by the Bao hash used to encode them. Additional metadata can also be included such as offset and index within the file to align the contents with IPLD DAGs or BitTorrent chunks. For the rsync frontend, original file metadata can be stored, and the rsync frontend indexes files by a hash of their path. Blake3 hashes will be keyed using the file's public key in order to improve privacy by breaking authoritative content hash tables (such as a sort of Rainbow table used to index files known by state actors).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant