-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] Web Pathing Specification: initial outline with TODOs #453
base: main
Are you sure you want to change the base?
Conversation
pushing extremely early draft of the scope to get early feedback from stakeholders that requested this specification to be created
The resulting specification should be detailed enough to allow competing, | ||
interoperable implementations. | ||
|
||
### TODO: things to cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @Stebalien @dignifiedquire @hacdias @aschmahmann @Jorropo @rvagg @ribasushi @alanshaw @2color @autonome @darobin for visibility and sourcing early feedback on the scope of this spec.
Feel free to drop a comment about any tricky/painful pathing edge cases you've encountered over the years that we should clarify web behavior for by including them in this spec 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to clarify how this differs, overlaps, or varies from pathing defined in https://specs.ipfs.tech/http-gateways/path-gateway, https://specs.ipfs.tech/http-gateways/trustless-gateway, & https://specs.ipfs.tech/http-gateways/subdomain-gateway.
- `sha2-384` (`0x20`, aka SHA-384; as specified by [FIPS 180-4](https://csrc.nist.gov/pubs/fips/180-4/upd1/final)) TODO: where is this used? why is this on the list? | ||
- sha3-512 TODO: code for such label does not exist, a typo in prior notes? follow up required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@John-LittleBearLabs these two were included in your draft for WICG proposal, do you remember the reason/source?
I've found the code for the second one in https://github.com/multiformats/multicodec/blob/master/table.csv but not sure if we intended sha3 (0x14) or should switch to sha2 (0x13) here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is meant by 'label'?
I don't recall, no. sha2-384 doesn't ring a bell - perhaps it was one of the comments that's now deleted (hackmd doesn't seem to let me mark things as resolved/hidden). As for sha3-512... it was probably not a good source; I think what it was was I found someone somewhere was talking about future-proofing hashes and I looked for one of the recommendations that also was marked as permanent in the table.
I'm definitely open to this list being altered.
|
||
### TODO: things to cover | ||
|
||
- TODO: why it's called "web pathing": ensuring pathing is interoperable with how existing http and web platform works; covers both /ipfs and /ipns namespace semantics; defines logical content root CID that can be mapped to URL / root which enables subdomain/dnslink gateways and ipfs:// and ipns:// protocol handlers to load existing datasets, websites, and assets with relative pathing without the need for modifying them; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it's called "web pathing"
I'm curious about this myself. It doesn't strike me as being particularly web-specific, at least not immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would "URL-safe pathing pathing" or "web-deterministic pathing" or "web-compatible pathing" be more precise? it isn't pathing FOR or OF the web, but rather a web-compatible subset of the pathing currently possible with the tech to date, right?
- TODO make it clear if both DAG variants of CBOR and JSON are a MUST, or if JSON is a SHOULD (right now conformance tests require both as a MUST). | ||
|
||
- TODO: MUST what happens when we can't traverse part of the path | ||
- TODO: separate errors for traversal errors due to missing codec vs missing content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
TODO: List relevant CIDs. Describe how implementations can use them to determine | ||
specification compliance. | ||
|
||
TODO: [gateway-conformance](https://github.com/ipfs/gateway-conformance) tests for all MUSTs in this spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 👍
|
||
- TODO: MUST support UnixFS pathing | ||
- TODO: traversing HAMTs | ||
- TODO: traversing symlinks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have questions, and not sure if I should be instead commenting here ?
There's a few basic forms I could imagine this working in, and they're not necessarily incompatible:
- /ipfs/cid1/a = "/ipfs/cid2/c" : /ipfs/cid1/a/b -> /ipfs/cid2/c/b
- Replace all left of and including current path element with link contents.
- IIRC I believe the gateway conformance test has this, so I'm guessing this is the real thing.
- Are we allowed to link to /ipns/ namespace?
- If so... even DNSLink? The link would still be immutable, but fully resolved what looks like part of your tree now depends on your local DNS setup?
- /ipfs/cid1/a = "c" : /ipfs/cid1/a/b -> /ipfs/cid1/c/b (i.e. not starting with /)
- Replace current path element with link contents.
- I read someone talking about converting tar to car and if so there's an important special case...
- "../b" : If allowed we might need rules about this.
- /ipfs/cid1/a/b = "/c" : /ipfs/cid1/a/b -> /ipfs/cid1/c
- Replace current path element and everything between the root and current element.
- Need rules about DAGs that contain a directory under root named either /ipfs/ or /ipns/ etc.
- I don't love features that break DAG symmetry, but others seem to 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how does this interact with _redirects (since it both has to be in root and its redirects can be relative to the root)?
Site A has a _redirects with splat to /a.html
Site B has a symlink (called link) to a's root, and its own redirects splat to /b.html
ipfs://B/link/notfound.html
becomes what exactly?
In my current PR it would redirect to ipfs://B/link/a.html
(e.g. it respects A's _redirects file, and does it relative to A's root). But if A did not have a redirect, it would be not found (e.g. B's _redirects is ignored).
Feels weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@John-LittleBearLabs (I've realized we've discussed this during one of sync calls but did not reply here)
- symlinks are generally underspecified and not used much. I would mark this as unspecified behavior in this spec until we land Publish UnixFS specifications at specs.ipfs.tech #331
- that being said, if you already implemented symlink support, its ok, only caveat is that following symlink should not allow for going beyond the content root (/ipfs/cid), so
/ipfs/cid1/a
pointing at/ipfs/cid2/b
or../cid2/b
must error - rules from
_redirects
are executed only when requested content path is missing within same origin (based on root CID). in scenario you described you operate under origin B and it is not aware of _redirects from origin A (so _redirects is not executed)
- TODO: multicodecs that are required to facilitate path traversal | ||
- DAG-PB | ||
- RAW | ||
- libp2p-key (for IPNS names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. My initial idea is to refer to IPNS spec which states that only Ed25519 is a MUST (RSA is SHOULD, other key types are MAY).
The resulting specification should be detailed enough to allow competing, | ||
interoperable implementations. | ||
|
||
### TODO: things to cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to clarify how this differs, overlaps, or varies from pathing defined in https://specs.ipfs.tech/http-gateways/path-gateway, https://specs.ipfs.tech/http-gateways/trustless-gateway, & https://specs.ipfs.tech/http-gateways/subdomain-gateway.
|
||
- TODO: MUSTs, SHOULDs and MAYs in relation to | ||
|
||
- TODO: multihash functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intention of this section to clarify baseline multihash & codecs that must be supported to provide content for libraries such as @helia/verified-fetch
?
|
||
- TODO: cid versions | ||
- MUST: | ||
- CIDv1 (`0x01`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we MUST support CIDv1, we should call out the multibase/hash/codecs that aren't guaranteed to be supported by web-pathing spec implementers.
- RAW | ||
- libp2p-key (for IPNS names) | ||
- DAG-CBOR | ||
- DAG-JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should MUST raw JSON as well, or is the intent to use RAW for that?
|
||
TODO: Explain the security implications/considerations relevant to the spec. | ||
|
||
TODO: length limit for entire path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we limiting to browser URLs, or do we want to support longer lengths? https://stackoverflow.com/a/417184/592760 is a really thorough answer talking about variants.
TODO: Explain the security implications/considerations relevant to the spec. | ||
|
||
TODO: length limit for entire path | ||
TODO: length limit for a path segment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should limit path segment lengths, but we should prevent /
in path segments opposite of IPLD pathing
|
||
- TODO: MUST what happens when we can't traverse part of the path | ||
- TODO: separate errors for traversal errors due to missing codec vs missing content | ||
- TODO: `/ipfs/valid-cid-dag-pb/invalid-path` (logical "not found", translates to HTTP 404 to indicate content does not exist, mention implicit http caching of 404 vs 500 – ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: a browser/HTTP specific section with additional behaviors that are possible when HTTP redirects can be executed:
- checking NFC and NFC normalized filenames if not found (gateway: run Unicode Normalisation Forms on path gateway inputs #457)
- checking _redirects in not found (https://specs.ipfs.tech/http-gateways/web-redirects-file/)
- index.html when requesting directory with
Accept
that hastext/html
(this one does not belong here, it should live in gateway specs instead, but writing down here so we dont forget documenting it)
The goal of this specification is to close #432 and define a subset of possible content paths that ensures compatibility with existing HTTP and Web Platform standards, and have clear MUSTs and SHOULDs that we can use when discussing implementation details of projects like ipfs-chromium's Intent to Prototype: Verifying IPFS client.
Pushing an extremely early draft of the scope to get early feedback.
Everyone is invited to comment on the PR, focusing on TODOs, MUSTs and SHOULDs and suggest improvements, especially if something is missing 🙏