-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unixfs: suggest, but do not strictly mandate unixfs<>dagpb #294
base: main
Are you sure you want to change the base?
Conversation
The strict coupling poses some problems for IPLD implementations that: 1. do not retain codec information between the serialization and UnixFS (ADL) reification; and/or 2. do not have a mechanism to strictly mandate that UnixFS data be _only_ encoded in a particular codec. A mandate in the specification that strictly defines the layering on top of the codec makes it difficult to implement it as an ADL, which also presents difficulty for using tooling that builds on IPLD lens-style layering. Ref: ipfs/go-unixfsnode#27 Ref: #271 Ref: ipld/go-car#304
As in #271 I would prefer that the ADL works as an ADL, so i would be in favor of this spec change. |
@@ -78,7 +78,7 @@ This `Data` object is used for all non-leaf nodes in Unixfs. | |||
|
|||
For files that are comprised of more than a single block, the 'Type' field will be set to 'File', the 'filesize' field will be set to the total number of bytes in the file (not the graph structure) represented by this node, and 'blocksizes' will contain a list of the filesizes of each child node. | |||
|
|||
This data is serialized and placed inside the 'Data' _Bytes_ node of the containing IPLD block, which also contains the actual links to the child nodes of this object in a 'Links' _List_ node. Typically this is encoded using the [DAG-PB](https://ipld.io/specs/codecs/dag-pb/spec/) codec. An implementations of this UnixFS specification may opt to strictly link DAG-PB to UnixFS for encoding and/or decoding as this is the originally intended layering of this format. For this reason, producers of UnixFS data that do not use DAG-PB as its codec should not expect other implementations of UnixFS to be able to interpret the data. | |||
This data is serialized and placed inside the 'Data' _Bytes_ node of the containing IPLD block, which also contains the actual links to the child nodes of this object in a 'Links' _List_ node. As such, a serialized UnixFS block must conform to the [DAG-PB Logical Format](https://ipld.io/specs/codecs/dag-pb/spec/#logical-format) schema. Typically this is then encoded using the [DAG-PB](https://ipld.io/specs/codecs/dag-pb/spec/) codec. An implementation of this UnixFS specification may opt to strictly link DAG-PB to UnixFS for encoding and/or decoding as this is the originally intended layering of this format. For this reason, producers of UnixFS data that do not use DAG-PB as its codec should not expect other implementations of UnixFS to be able to interpret the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This data is serialized and placed inside the 'Data' _Bytes_ node of the containing IPLD block, which also contains the actual links to the child nodes of this object in a 'Links' _List_ node. As such, a serialized UnixFS block must conform to the [DAG-PB Logical Format](https://ipld.io/specs/codecs/dag-pb/spec/#logical-format) schema. Typically this is then encoded using the [DAG-PB](https://ipld.io/specs/codecs/dag-pb/spec/) codec. An implementation of this UnixFS specification may opt to strictly link DAG-PB to UnixFS for encoding and/or decoding as this is the originally intended layering of this format. For this reason, producers of UnixFS data that do not use DAG-PB as its codec should not expect other implementations of UnixFS to be able to interpret the data. | |
This data is serialized and placed inside the 'Data' _Bytes_ node of the containing IPLD block, which also contains the actual links to the child nodes of this object in a 'Links' _List_ node. As such, a serialized UnixFS block **MUST** conform to the [DAG-PB **Logical Format**](https://ipld.io/specs/codecs/dag-pb/spec/#logical-format) schema. Typically this is then encoded using the [DAG-PB](https://ipld.io/specs/codecs/dag-pb/spec/) codec. An implementation of this UnixFS specification may opt to strictly link DAG-PB to UnixFS for encoding and/or decoding as this is the originally intended layering of this format. For this reason, producers of UnixFS data that do not use DAG-PB as its codec should not expect other implementations of UnixFS to be able to interpret the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rvagg this works! Adding a little more screamers to make sure people do not miss it, but good enough. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would normally highlight SHOULD, MUST, MAY, etc. in a spec but this one doesn't already have that where it uses those words and I didn't want to make this section too different, nor do I want to increase the scope of the PR.
Happy to change it if others feel strongly enough though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2c:
- this is a desirable change, directionally (same reasoning as Will)
- I think this reads pretty great -- the parlance updates are good and clear, and the cautions in all directions about compatibility seem accurate and appropriate.
Well written, @rvagg <3
what else is needed to merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open the door slightly, but not too far that this complicates life for existing implementations
I am late to the party, so apologies if I miss the point here, but how confident are we the below won't happen?
- something like relaxed reification go-unixfsnode#27 allows a dag-pb node to have been serialized to cbor or json, and still be interpreted as unixfs data – it gets merged, every go app supports it now
- we relax spec here
- people see both spec and go library allow cbor, start producing UnixFS dags with cbor instead of dag-pb because "it works in Kubo and Gateways already, and this way we use cbor/json everywhere"
- complicated life begins:
- someone tries to implement a minimal IPFS node in new language from scratch, and they now need to debug things because their implementation can't correctly read "valid UnixFSv1"
- legacy node is unable to read new "valid UnixFSv1" created by some alternative IPFS implementation that decided to use cbor instead of pb by default because both specs and most common go-library allow it, and they don't care about "legacy" dag-pb
In my mind, this PR should bump the version to UnixFS v2 because it allows for a new type of UnixFS DAGs to be created, that are no longer interoperable with existing implementations that support v1 and v1.5.
Am I missing something?
The strict coupling poses some problems for IPLD implementations that:
A mandate in the specification that strictly defines the layering on top of the codec makes it difficult to implement it as an ADL, which also presents difficulty for using tooling that builds on IPLD lens-style layering.
Ref: ipfs/go-unixfsnode#27
Ref: #271
Ref: ipld/go-car#304
So, this is basically a take-two on #271, but with an extra helping of nuance in an attempt to open the door slightly, but not too far that this complicates life for existing implementations. The main aim here is to allow for the kind of layering approach that we have with go-ipld-prime + ADLs. As per ipld/go-car#304 we already have situations where we have enough distance between block decode and ADL interpret that we can't properly apply a type check. This gives us wriggle-room to pass the data through the generic go-ipld-prime "lens" paradigm and let it come out as UnixFS; even if in practice we tightly couple it when encoding the data and may impose an expectation when decoding, where possible.
But also, it'd be good to have at least the potential for separation so we can treat the UnixFS ADL as a full ADL and not a special-cased beast.
At the same time this PR updates the language of this section to current IPLD parlance.
Any stomach for nuance?