unixfs: suggest, but do not strictly mandate unixfs<>dagpb #294

rvagg · 2022-06-27T11:31:01Z

The strict coupling poses some problems for IPLD implementations that:

do not retain codec information between the serialization and UnixFS (ADL) reification; and/or
do not have a mechanism to strictly mandate that UnixFS data be only encoded in a particular codec.

A mandate in the specification that strictly defines the layering on top of the codec makes it difficult to implement it as an ADL, which also presents difficulty for using tooling that builds on IPLD lens-style layering.

Ref: ipfs/go-unixfsnode#27
Ref: #271
Ref: ipld/go-car#304

So, this is basically a take-two on #271, but with an extra helping of nuance in an attempt to open the door slightly, but not too far that this complicates life for existing implementations. The main aim here is to allow for the kind of layering approach that we have with go-ipld-prime + ADLs. As per ipld/go-car#304 we already have situations where we have enough distance between block decode and ADL interpret that we can't properly apply a type check. This gives us wriggle-room to pass the data through the generic go-ipld-prime "lens" paradigm and let it come out as UnixFS; even if in practice we tightly couple it when encoding the data and may impose an expectation when decoding, where possible.

But also, it'd be good to have at least the potential for separation so we can treat the UnixFS ADL as a full ADL and not a special-cased beast.

At the same time this PR updates the language of this section to current IPLD parlance.

Any stomach for nuance?

The strict coupling poses some problems for IPLD implementations that: 1. do not retain codec information between the serialization and UnixFS (ADL) reification; and/or 2. do not have a mechanism to strictly mandate that UnixFS data be _only_ encoded in a particular codec. A mandate in the specification that strictly defines the layering on top of the codec makes it difficult to implement it as an ADL, which also presents difficulty for using tooling that builds on IPLD lens-style layering. Ref: ipfs/go-unixfsnode#27 Ref: #271 Ref: ipld/go-car#304

willscott · 2022-06-27T11:49:00Z

As in #271 I would prefer that the ADL works as an ADL, so i would be in favor of this spec change.

UNIXFS.md

ribasushi · 2022-06-27T12:21:54Z

UNIXFS.md

@@ -78,7 +78,7 @@ This `Data` object is used for all non-leaf nodes in Unixfs.

 For files that are comprised of more than a single block, the 'Type' field will be set to 'File', the 'filesize' field will be set to the total number of bytes in the file (not the graph structure) represented by this node, and 'blocksizes' will contain a list of the filesizes of each child node.

-This data is serialized and placed inside the 'Data' _Bytes_ node of the containing IPLD block, which also contains the actual links to the child nodes of this object in a 'Links' _List_ node. Typically this is encoded using the [DAG-PB](https://ipld.io/specs/codecs/dag-pb/spec/) codec. An implementations of this UnixFS specification may opt to strictly link DAG-PB to UnixFS for encoding and/or decoding as this is the originally intended layering of this format. For this reason, producers of UnixFS data that do not use DAG-PB as its codec should not expect other implementations of UnixFS to be able to interpret the data.
+This data is serialized and placed inside the 'Data' _Bytes_ node of the containing IPLD block, which also contains the actual links to the child nodes of this object in a 'Links' _List_ node. As such, a serialized UnixFS block must conform to the [DAG-PB Logical Format](https://ipld.io/specs/codecs/dag-pb/spec/#logical-format) schema. Typically this is then encoded using the [DAG-PB](https://ipld.io/specs/codecs/dag-pb/spec/) codec. An implementation of this UnixFS specification may opt to strictly link DAG-PB to UnixFS for encoding and/or decoding as this is the originally intended layering of this format. For this reason, producers of UnixFS data that do not use DAG-PB as its codec should not expect other implementations of UnixFS to be able to interpret the data.


Suggested change

This data is serialized and placed inside the 'Data' _Bytes_ node of the containing IPLD block, which also contains the actual links to the child nodes of this object in a 'Links' _List_ node. As such, a serialized UnixFS block must conform to the [DAG-PB Logical Format](https://ipld.io/specs/codecs/dag-pb/spec/#logical-format) schema. Typically this is then encoded using the [DAG-PB](https://ipld.io/specs/codecs/dag-pb/spec/) codec. An implementation of this UnixFS specification may opt to strictly link DAG-PB to UnixFS for encoding and/or decoding as this is the originally intended layering of this format. For this reason, producers of UnixFS data that do not use DAG-PB as its codec should not expect other implementations of UnixFS to be able to interpret the data.

This data is serialized and placed inside the 'Data' _Bytes_ node of the containing IPLD block, which also contains the actual links to the child nodes of this object in a 'Links' _List_ node. As such, a serialized UnixFS block **MUST** conform to the [DAG-PB **Logical Format**](https://ipld.io/specs/codecs/dag-pb/spec/#logical-format) schema. Typically this is then encoded using the [DAG-PB](https://ipld.io/specs/codecs/dag-pb/spec/) codec. An implementation of this UnixFS specification may opt to strictly link DAG-PB to UnixFS for encoding and/or decoding as this is the originally intended layering of this format. For this reason, producers of UnixFS data that do not use DAG-PB as its codec should not expect other implementations of UnixFS to be able to interpret the data.

@rvagg this works! Adding a little more screamers to make sure people do not miss it, but good enough. Thank you!

I would normally highlight SHOULD, MUST, MAY, etc. in a spec but this one doesn't already have that where it uses those words and I didn't want to make this section too different, nor do I want to increase the scope of the PR.
Happy to change it if others feel strongly enough though.

warpfork

My 2c:

this is a desirable change, directionally (same reasoning as Will)
I think this reads pretty great -- the parlance updates are good and clear, and the cautions in all directions about compatibility seem accurate and appropriate.

Well written, @rvagg <3

willscott · 2022-07-21T07:39:15Z

what else is needed to merge this?

lidel

open the door slightly, but not too far that this complicates life for existing implementations

I am late to the party, so apologies if I miss the point here, but how confident are we the below won't happen?

something like relaxed reification go-unixfsnode#27 allows a dag-pb node to have been serialized to cbor or json, and still be interpreted as unixfs data – it gets merged, every go app supports it now
we relax spec here
people see both spec and go library allow cbor, start producing UnixFS dags with cbor instead of dag-pb because "it works in Kubo and Gateways already, and this way we use cbor/json everywhere"
complicated life begins:
- someone tries to implement a minimal IPFS node in new language from scratch, and they now need to debug things because their implementation can't correctly read "valid UnixFSv1"
- legacy node is unable to read new "valid UnixFSv1" created by some alternative IPFS implementation that decided to use cbor instead of pb by default because both specs and most common go-library allow it, and they don't care about "legacy" dag-pb

In my mind, this PR should bump the version to UnixFS v2 because it allows for a new type of UnixFS DAGs to be created, that are no longer interoperable with existing implementations that support v1 and v1.5.
Am I missing something?

rvagg requested review from willscott and aschmahmann June 27, 2022 11:31

rvagg mentioned this pull request Jun 27, 2022

relaxed reification ipfs/go-unixfsnode#27

Open

ribasushi requested changes Jun 27, 2022

View reviewed changes

UNIXFS.md Outdated Show resolved Hide resolved

fixup! unixfs: suggest, but do not strictly mandate unixfs<>dagpb

17a8cb6

ribasushi approved these changes Jun 27, 2022

View reviewed changes

warpfork approved these changes Jun 27, 2022

View reviewed changes

lidel requested changes Jul 21, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unixfs: suggest, but do not strictly mandate unixfs<>dagpb #294

unixfs: suggest, but do not strictly mandate unixfs<>dagpb #294

rvagg commented Jun 27, 2022

willscott commented Jun 27, 2022

ribasushi Jun 27, 2022

ribasushi Jun 27, 2022

rvagg Jun 27, 2022

warpfork left a comment

willscott commented Jul 21, 2022

lidel left a comment •

edited

Loading

unixfs: suggest, but do not strictly mandate unixfs<>dagpb #294

Are you sure you want to change the base?

unixfs: suggest, but do not strictly mandate unixfs<>dagpb #294

Conversation

rvagg commented Jun 27, 2022

willscott commented Jun 27, 2022

ribasushi Jun 27, 2022

Choose a reason for hiding this comment

ribasushi Jun 27, 2022

Choose a reason for hiding this comment

rvagg Jun 27, 2022

Choose a reason for hiding this comment

warpfork left a comment

Choose a reason for hiding this comment

willscott commented Jul 21, 2022

lidel left a comment • edited Loading

Choose a reason for hiding this comment

lidel left a comment •

edited

Loading