Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-293: Add /ipld Gateway Specs #293

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

RangerMauve
Copy link

WIP: Drafting up specs for /ipld/ support in gateways

@rvagg
Copy link
Member

rvagg commented Jun 27, 2022

nice

@lidel where do we stand with writable gateways? I don't believe we have writability enabled anywhere (yet) and there's been some debate about pulling the trigger on this. If we put POST and PATCH in the spec, what does that mean for implementations? Does that just become an optional part, and we could handle it in code by allowing the config to turn on writability?

@RangerMauve
Copy link
Author

I'm actually gonna be working on the writable gateway spec this week based on the stuff we did in the Agregore IPFS Daemon. AgregoreWeb/agregore-ipfs-daemon#10

@RangerMauve
Copy link
Author

I've been signaling writability support by returning different HTTP method names for HEAD requests (e.g. only show PUT if it's a writable gateway)

@RangerMauve
Copy link
Author

Err, should I submit a new PR along with the required doc once this is more thoroughly flushed out?

@lidel
Copy link
Member

lidel commented Jul 2, 2022

@RangerMauve no, this was a bug in Github (closed PRs against my branch instead of rebasing them against main).
I'll fix it manually.

@lidel lidel reopened this Jul 2, 2022
@lidel lidel changed the base branch from feat/gateway-specs to main July 2, 2022 15:05
@lidel
Copy link
Member

lidel commented Jul 2, 2022

@rvagg The idea is to flesh the writable gateways optional. Just want to specify behavior is someone wants to implement it.

Potential user:

  • Public services such as Gateways or Pinning Services could decide to allow writes only if requests include Authorization: Bearer <token>
  • localhost gateway provided by things like Kubo, Brave, Agregore could have it enabled by default (tbd)

@lidel lidel changed the title Add IPLD Gateway Specs IPIP: Add IPLD Gateway Specs Jul 2, 2022
@BigLep
Copy link
Contributor

BigLep commented Jul 19, 2022

2022-07-19 conversation: there are things from IPFS Thing that influence this:

  1. Demonstrations by @aschmahmann
  2. Discussions that were had with @mikeal

@RangerMauve wants to understand the WASM story more and how that would impact this.

@softwareplumber
Copy link

Couple of comments:-

I think the parameters block (enclosed by square braces) is essentially doing what Timbl's old Matrix URI idea: on w3.org was trying to do. Matrix URIs are sparsely supported in some parts of the Web2 ecosystem (the Java JAX-RS API, I believe). Getting implementers of web toolkits (express, etc) to provide native support for IPFS URIs may be easier if we can 'invoke Timbl' rather than attempting to convince them a prori of the rightness of our approach.

More practically, there is an awful lot of infrastructure out there (proxies, reverse proxies, kubernetes ingress controllers, etc) which depends on cruddy regular-expression based parsing of URIs. Re-using '&' as a separator is an admirable idea from the point of view of code re-use but it risks breaking any bad regex that is relying on '&' appearing first in a query string. Moreover using '[' as a meaningful separator in the URI will make any implementer of said cruddy regex parsers cry into their beer as they try to navigate thickets of escape characters.

For these reasons I'd suggest that ';' as a separator, per the Matrix URI spec, is preferable.

@aschmahmann
Copy link
Contributor

@softwareplumber IIUC the Matrix URI puts ; only at the end of a path and not in the middle and as a result looks an awful lot like query parameters ?a=b&c=d.... Part of the idea here is that including the escaped path information in the middle allows the paths to be easier to understand.

For example: /ipld/bafyroot/[ADL = HAMT]/entry1/field2/[ADL = FBL] can render the bytes for a picture, video, etc. Where HAMT and FBL are the abstract data layouts described in https://ipld.io/specs/advanced-data-layouts/.

@aschmahmann
Copy link
Contributor

Noting that there have been proposals to use other signaling mechanisms than out-of-band including it in the path. I'd recommend those interested take a look at https://ipld.io/docs/advanced-data-layouts/signalling/ for some background (including following through to the naming and dynamic loading sections if you're interested). While none of the opinions on that website are "law" they may provide some useful context in either forming your own opinions or understanding those of others. If folks have other useful resources I'd drop them here as well.

@RangerMauve
Copy link
Author

Btw, I did a talk last week at the IPFS thing and here are the slides from it: https://blog.mauve.moe/slides/ipld-gateway/#1

You can press p to open up the speaker notes for some of the stuff I said (will link to the recording here once it is published)

@softwareplumber
Copy link

softwareplumber commented Jul 21, 2022 via email

@RangerMauve
Copy link
Author

RangerMauve commented Jul 27, 2022

I'm personally all for reusing an existing standard. One thing I like about this JAX thing is that it disambiguates which "side" the metadata goes on. e.g. ipld://cid/[foo=bar]example/ and ipld://cid/example[foo=bar]/ could serialize to the same thing, and could also lead to confusing situations like ipld://cid/[foo=bar]example[foo=baz]/.

With this syntax, we know for sure that the extra data goes *after the segment name.

Using semicolons to separate bits means that we can't just dump the segment into URLSearchParams, but I think that's easy enough to work around.

We can then say that just ; needs to be escaped rather than [ and ] in path segments.

Also, the thing about proxies seems like a good call since those things have caused issues in the past.

I'm not particularly married to [ ] so I'm down to spec paths with this JAX syntax instead. I'll reference the Matrix URI spec here: https://www.w3.org/DesignIssues/MatrixURIs.html

(I'll do some tests with existing URL parsers to see if they complain about it)

@softwareplumber
Copy link

softwareplumber commented Jul 31, 2022

@RangerMauve that's awesome; even if the idea doesn't work out I'm really pleased it's being considered.

If I can make one other suggestion, for future-proofing it might be an idea to somehow 'namespace' keywords like 'ADL' (perhaps prefixing with '$') and, maybe, reserve some kind of wildcard character in the spec. I have a gut feeling that eventually we're going to want a path-like syntax to represent selectors (or something that replaces selectors) and providing upward compatibility in the Gateway URI spec so that Gateway URIs are a subset of Selector URIs would be a good thing.

For example, I'm thinking that in the fullness of time a path like <CID>/folder/*;owner=alice might represent all the descendants of the 'folder' node with an attribute 'owner' equal to 'alice'. The path after the CID is of course a human readable representation of a simple selector; if we ever want a gateway to have this functionality it would make sense to reserve '*' and ensure that our operators (such as ADL) can be distinguished from attribute names.

@RangerMauve
Copy link
Author

Hmm. Extra keywords in the path seem interesting, I wonder if it'd be stepping over some of the use cases of selectors, however.

One of the things I was thinking would be important is that the result for these IPLD URLs / operations should be either a new IPLD data model node, or a URL pointing to such a node. Would using extra wildcards have it return a list node?

Might be good to talk about it on the call.

@softwareplumber
Copy link

softwareplumber commented Aug 1, 2022 via email

@RangerMauve
Copy link
Author

Recording from the discussion we had about this at the IPLD thing last month is up here: https://www.youtube.com/watch?v=_uXKIEmJh3g

@RangerMauve
Copy link
Author

I got some initial ipld:// protocol support into [email protected] 😁 So far it supports a basic GET/POST which can do coversion between codecs at the protocol handler layer. https://github.com/RangerMauve/js-ipfs-fetch/#await-fetchipldcidexample-method-get-headers-accept-applicationjson

I've also put together a JS library for parsing and serializing ipld:// URLs with the new matrix parameters syntax for adding extra signaling for path segments

https://github.com/RangerMauve/js-ipld-url

I'm also gonna release it in the Agregore Browser for desktop to make it a bit easier to mess around with.

I got POST more figured out, along with some uses for the ?format parameter.

Next up, I wanna look into sketching up what the schema parameter could mean for path segments.

I'll also update the gateway spec with these new changes as they come. 😁

@RangerMauve
Copy link
Author

I've got some code going in JavaScript which supports IPLD Schemas in path segment parameters.

RangerMauve/js-ipld-url-resolve#1

I'm feeling pretty comfortable with this one where I've got schema CIDs within the parameters as well as which type to interpret a node as. It will also apply types to any fields that get traversed via linking.

I'll likely need more tests for nested structs that contain Links, but so far so good. 😁

@BigLep
Copy link
Contributor

BigLep commented Sep 13, 2022

2022-09-13 IPLD triage conversation on next steps:

  1. Add test fixtures
  2. Adding jsdoc typescript hints
  3. Move relevant libraries into the ipld github org

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @RangerMauve, did a quick pass with initial feedback.

The `body` of the request shall be parsed according to the `Content-Type` as IPLD data via standard encodings.
`/localhost/` is used to support `POST ipld://localhost/` for uploading IPLD data to local nodes in web browsers that support it.

The response will contain an `ipld://{cid}/` URL pointing at your data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec should remove any ambiguity:

  • Contain it where? (A) plain text in response body? B) a Location header?
  • What will be content-type of the response? text/plain ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I'd like to clarify with @fabricedesre since we had a bit of a disagreement.

Right now the precendent within Kubo and Agregore's protocol handlers is that there will be a 201 response with a Location header containing the URL as well as an empty body.

Fabrice was into having a 200 response and the URL inside the response body, which is something I was originally doing in Agregore, but switch when we started extending the writable gateway functionality in Kubo.

Ideally we should settle on the best course of action here during Lisbon. 😅

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I'd like to use this to inform all the other protocol handlers too.

The response will contain an `ipld://{cid}/` URL pointing at your data.

<!--
TODO: Only allow `/localhost/`? Get rid of `/localhost` from the spec if light clients with protocol handlers don't matter/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have use cases where things other than localhost could be used in the future?
e.g. do we want to support POST to IPNS identifier?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using POST to ipfs://localhost, or a PUT ipfs://cid/ as well as POST ipns://key to update CIDs, or PUT ipns://key in the Agregore IPFS Daemon Spec

http-gateways/IPLD_GATEWAY.md Outdated Show resolved Hide resolved
For `/ipld/{cid}/*` paths, the `Accept` header is used to indicate the encoding that should be used to return the data.
This means that data initially encoded as `dag-json` will be transcoded to `dag-cbor` if the `application/vnd.ipld.dag-cbor` Accept header is used.

- `application/json`: Interpret in the same way as `application/vnd.ipld.dag-json`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if data is a valid JSON (and not DAG-JSON) added to ipfs with json codec (and not dag-json)?
Parsing it as dag-json will error, even tho it is a valid JSON.

@hacdias and I discussed this edge case and ended up with requirement to check codec from CID, and if it is json, use generic JSON codec instead of dag-json.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Do you have text written up somewhere that I can copy paste here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RangerMauve we have some wording here, but it may not be definitive:

- [application/vnd.ipld.dag-json](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-json) – requests [IPLD Data Model](https://ipld.io/docs/data-model/) representation serialized into [DAG-JSON format](https://ipld.io/docs/codecs/known/dag-json/)
- [application/vnd.ipld.dag-cbor](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-cbor) – requests [IPLD Data Model](https://ipld.io/docs/data-model/) representation serialized into [DAG-CBOR format](https://ipld.io/docs/codecs/known/dag-cbor/)
- [application/json](https://www.iana.org/assignments/media-types/application/json) – same as `application/vnd.ipld.dag-json`, unless the CID's codec is JSON. Then, the raw JSON block can be returned
- [application/cbor](https://www.iana.org/assignments/media-types/application/cbor) – same as `application/vnd.ipld.dag-cbor`, unless the CID's codec is CBOR. Then, the raw CBOR block can be returned

http-gateways/IPLD_GATEWAY.md Outdated Show resolved Hide resolved
http-gateways/IPLD_GATEWAY.md Show resolved Hide resolved
} representation tuple
```

The CID for the DMT of this schema is `bafyreibvheoym4avfsjfw63yhsymovm7o54ftcnxwxovqf5xxcbjddanze`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: was unable to inspec this via ipfs dag get --output-codec=dag-json bafyreibvheoym4avfsjfw63yhsymovm7o54ftcnxwxovqf5xxcbjddanze | jq

As a rule of thumb, CIDs used in IPIP should be publicly available and pinned (e.g. to https://estuary.tech and https://web3.storage, do not use Pinata as afaik it does not announce CIDs on DHT).

We will have automation for this, btu for now it is up to IPIP author to handle.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I maybe include some CBOR files with the fixtures that are relevant to the spec?

For example, given the following schema (note it is written in DSL form, but must be converted to the DMT in order to be refernced):

```ipldschema
type Example struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I've read this section and tbh have no idea what is the value to end user – CBOR traversal and field resolution with extra steps so the output looks a certain way?

In ADL section we have good use case "ADL that's used to represent large maps" – we need similar real world example for schemas.

What would a schema be useful for irl? I feel the spec here needs better Example, so the value is obvious.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg @warpfork would you be able to comment on real world uses of IPLD Schema that would be relevant here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One use case for schemas is to use the representation functionality to render data in more human/application readable formats from formats that are more size efficient.

e.g. some things might be using a listpairs representation which would look like an array of arrays by default. But with a schema you can transform the representation to be more human readable.

I'm gonna be doing stuff along this line for the Prolly Tree work where we'll be encoding tree nodes more efficiently, but having a way to put them through a schema before the application code starts working with them.

http-gateways/IPLD_GATEWAY.md Outdated Show resolved Hide resolved
http-gateways/IPLD_GATEWAY.md Show resolved Hide resolved
@lidel lidel changed the title IPIP: Add IPLD Gateway Specs IPIP-293: Add /ipld Gateway Specs Oct 26, 2022
@RangerMauve
Copy link
Author

@darobin would also appreciate help on this spec since it's relevant to a bunch of IPLD stuff I'd like to surface. Probably lower priority than the writable gateway stuff.

@BigLep
Copy link
Contributor

BigLep commented Nov 15, 2022

2022-11-15 IPLD Triage conversation: We're still a ways off on this. This likely wouldn't be a candidate for merge until 2023Q1.

@RangerMauve wants to:

  1. spend more time on the resolver
  2. spec out traversing into paths vs. to CIDs
  3. generate more fixtures
  4. specing out more of the URL spec with IPLD schemas
  5. Review the divergence of "patch" implementation across Go and JS

@hannahhoward
Copy link
Contributor

@RangerMauve just curious what's the deal with trustlessness and multi block retrievals here?

It seems like all of Accept formats are dag-json/dag-cbor, but that implies a single block response. So that means this is a trust based protocol I think? (since anything other than root cid is not verifiable)

@RangerMauve
Copy link
Author

@hannahhoward Yes, this spec currently relies on the same trustful semantics as the IPFS gateway for loading content.

I think if somebody wants to have trustlessness, then downloading CARs and doing traversal / verification at the application level would be the way to go.

I could see there being some more complex API endpoints which could yield CARs with proofs of everything, but tbh I think it'd be overkill unless there's a specific use case folks had in mind.

One goal of specifying stuff in terms of URLs and methods is that this could be abstracted over where the "backend" could literally a backed that runs this gateway, or a library could implement these things using a light IPFS node which just does bitswap with a trustless gateway and does the validation and traversal client-side, or it could be running along side a full local node that does all the p2p bits as well.

@RangerMauve
Copy link
Author

RangerMauve commented Dec 7, 2022

So, I've been messing with this some more.

Some updates: I've been thinking that instead of using querystring parameters for applying parameters to the root, we could store all that in the hostname. The main reason is that it keeps the parameters being applied to a node beside the node itself, and applying different ADLs/Schemas on something feels like it should modify the "origin" due to the added transformations. It should also make it easier to create relative URLs. example://whatever?something' navigating to /elsewould yieldexample://whatever/elseand lose the?something`. When the parameters are in the hostname they will stay there during relative navigations.

This means that before a URL might have looked like

ipld://cid/path/here?schema=cidhere&type=Example`

Now it'd look like

ipld://cid;schema=cidhere;type=Example/path/here`

As well, I'm thinking of giving the ending / in URLs special meaning to account for traversing to a CID vs traversing into a CID. As encapsulated here: ipld/ipld#250

My proposal is to treat a lack of a trailing / as being to the CID without resolving, but having a trailing / means traversing into the CID.

So ipld://cid/foo would yield {"/": "bafywhatever} but ipld://cid/foo/ would yield the thing that foo points to which is {"hello": "world"}.

@RangerMauve
Copy link
Author

Sadly I'm not working for the IPLD team anymore, but I recently added the new IPLD Patch stuff and advanced pathing into the latest version of Agregore.

Overall I'm really happy with the ergonomics of applying lenses and patching over them using a simple declarative (and deterministic!) interface.

https://github.com/AgregoreWeb/agregore-browser/releases/tag/v2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏃 In Progress
Development

Successfully merging this pull request may close these issues.

8 participants