-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPIP-293: Add /ipld Gateway Specs #293
base: main
Are you sure you want to change the base?
Conversation
nice @lidel where do we stand with writable gateways? I don't believe we have writability enabled anywhere (yet) and there's been some debate about pulling the trigger on this. If we put POST and PATCH in the spec, what does that mean for implementations? Does that just become an optional part, and we could handle it in code by allowing the config to turn on writability? |
I'm actually gonna be working on the writable gateway spec this week based on the stuff we did in the Agregore IPFS Daemon. AgregoreWeb/agregore-ipfs-daemon#10 |
I've been signaling writability support by returning different HTTP method names for |
Err, should I submit a new PR along with the required doc once this is more thoroughly flushed out? |
@RangerMauve no, this was a bug in Github (closed PRs against my branch instead of rebasing them against main). |
@rvagg The idea is to flesh the writable gateways optional. Just want to specify behavior is someone wants to implement it. Potential user:
|
2022-07-19 conversation: there are things from IPFS Thing that influence this:
@RangerMauve wants to understand the WASM story more and how that would impact this. |
Couple of comments:- I think the parameters block (enclosed by square braces) is essentially doing what Timbl's old Matrix URI idea: on w3.org was trying to do. Matrix URIs are sparsely supported in some parts of the Web2 ecosystem (the Java JAX-RS API, I believe). Getting implementers of web toolkits (express, etc) to provide native support for IPFS URIs may be easier if we can 'invoke Timbl' rather than attempting to convince them a prori of the rightness of our approach. More practically, there is an awful lot of infrastructure out there (proxies, reverse proxies, kubernetes ingress controllers, etc) which depends on cruddy regular-expression based parsing of URIs. Re-using '&' as a separator is an admirable idea from the point of view of code re-use but it risks breaking any bad regex that is relying on '&' appearing first in a query string. Moreover using '[' as a meaningful separator in the URI will make any implementer of said cruddy regex parsers cry into their beer as they try to navigate thickets of escape characters. For these reasons I'd suggest that ';' as a separator, per the Matrix URI spec, is preferable. |
@softwareplumber IIUC the Matrix URI puts For example: |
Noting that there have been proposals to use other signaling mechanisms than out-of-band including it in the path. I'd recommend those interested take a look at https://ipld.io/docs/advanced-data-layouts/signalling/ for some background (including following through to the naming and dynamic loading sections if you're interested). While none of the opinions on that website are "law" they may provide some useful context in either forming your own opinions or understanding those of others. If folks have other useful resources I'd drop them here as well. |
Btw, I did a talk last week at the IPFS thing and here are the slides from it: https://blog.mauve.moe/slides/ipld-gateway/#1 You can press |
Well, the JAX-RS spec *definitely* provides for matrix params within the
path. I admit it's not so clear from the original Timbl musing, but the
Java world is shot through with examples of matix params used in the
middle of the path (e.g.
https://www.logicbig.com/tutorials/java-ee-tutorial/jax-rs/jaxrs-matrix-param.html).
So the example would become:
/ipld/bafyroot;ADL=HAMT/entry1/field2;ADL=FBL
|
I'm personally all for reusing an existing standard. One thing I like about this JAX thing is that it disambiguates which "side" the metadata goes on. e.g. With this syntax, we know for sure that the extra data goes *after the segment name. Using semicolons to separate bits means that we can't just dump the segment into URLSearchParams, but I think that's easy enough to work around. We can then say that just Also, the thing about proxies seems like a good call since those things have caused issues in the past. I'm not particularly married to (I'll do some tests with existing URL parsers to see if they complain about it) |
@RangerMauve that's awesome; even if the idea doesn't work out I'm really pleased it's being considered. If I can make one other suggestion, for future-proofing it might be an idea to somehow 'namespace' keywords like 'ADL' (perhaps prefixing with '$') and, maybe, reserve some kind of wildcard character in the spec. I have a gut feeling that eventually we're going to want a path-like syntax to represent selectors (or something that replaces selectors) and providing upward compatibility in the Gateway URI spec so that Gateway URIs are a subset of Selector URIs would be a good thing. For example, I'm thinking that in the fullness of time a path like |
Hmm. Extra keywords in the path seem interesting, I wonder if it'd be stepping over some of the use cases of selectors, however. One of the things I was thinking would be important is that the result for these IPLD URLs / operations should be either a new IPLD data model node, or a URL pointing to such a node. Would using extra wildcards have it return a list node? Might be good to talk about it on the call. |
Yes, there is a crossover with selector use cases, but that's fine. What
I'm basically saying is that building space in the URI spec so that it
could eventually handle some of those use cases without having to be
re-written can only be a good thing. ( :-J And, in the meantime, this
maybe gives us a way to write down simple selectors that doesn't involve
mind-destroying numbers of curly braces )
I agree that the question of exactly what a path with wildcards in it
would return is a vexed one. The simple answer ('a list of nodes') may
be wrong. But I think that's a design bridge that could be crossed if
anyone ever wants to implement the feature. What I'm suggesting is more
leaving space (literal namespace) for someone to implement it if they
want to.
What call? I'm just a newb.
|
Recording from the discussion we had about this at the IPLD thing last month is up here: https://www.youtube.com/watch?v=_uXKIEmJh3g |
I got some initial I've also put together a JS library for parsing and serializing https://github.com/RangerMauve/js-ipld-url I'm also gonna release it in the Agregore Browser for desktop to make it a bit easier to mess around with. I got Next up, I wanna look into sketching up what the I'll also update the gateway spec with these new changes as they come. 😁 |
I've got some code going in JavaScript which supports IPLD Schemas in path segment parameters. RangerMauve/js-ipld-url-resolve#1 I'm feeling pretty comfortable with this one where I've got schema CIDs within the parameters as well as which type to interpret a node as. It will also apply types to any fields that get traversed via linking. I'll likely need more tests for nested structs that contain Links, but so far so good. 😁 |
2022-09-13 IPLD triage conversation on next steps:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @RangerMauve, did a quick pass with initial feedback.
The `body` of the request shall be parsed according to the `Content-Type` as IPLD data via standard encodings. | ||
`/localhost/` is used to support `POST ipld://localhost/` for uploading IPLD data to local nodes in web browsers that support it. | ||
|
||
The response will contain an `ipld://{cid}/` URL pointing at your data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spec should remove any ambiguity:
- Contain it where? (A) plain text in response body? B) a
Location
header? - What will be content-type of the response?
text/plain
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I'd like to clarify with @fabricedesre since we had a bit of a disagreement.
Right now the precendent within Kubo and Agregore's protocol handlers is that there will be a 201 response with a Location
header containing the URL as well as an empty body.
Fabrice was into having a 200 response and the URL inside the response body, which is something I was originally doing in Agregore, but switch when we started extending the writable gateway functionality in Kubo.
Ideally we should settle on the best course of action here during Lisbon. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, I'd like to use this to inform all the other protocol handlers too.
The response will contain an `ipld://{cid}/` URL pointing at your data. | ||
|
||
<!-- | ||
TODO: Only allow `/localhost/`? Get rid of `/localhost` from the spec if light clients with protocol handlers don't matter/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have use cases where things other than localhost
could be used in the future?
e.g. do we want to support POST to IPNS identifier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using POST
to ipfs://localhost
, or a PUT ipfs://cid/
as well as POST ipns://key
to update CIDs, or PUT ipns://key
in the Agregore IPFS Daemon Spec
For `/ipld/{cid}/*` paths, the `Accept` header is used to indicate the encoding that should be used to return the data. | ||
This means that data initially encoded as `dag-json` will be transcoded to `dag-cbor` if the `application/vnd.ipld.dag-cbor` Accept header is used. | ||
|
||
- `application/json`: Interpret in the same way as `application/vnd.ipld.dag-json`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if data is a valid JSON (and not DAG-JSON) added to ipfs with json
codec (and not dag-json
)?
Parsing it as dag-json will error, even tho it is a valid JSON.
@hacdias and I discussed this edge case and ended up with requirement to check codec from CID, and if it is json
, use generic JSON codec instead of dag-json
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Do you have text written up somewhere that I can copy paste here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RangerMauve we have some wording here, but it may not be definitive:
specs/http-gateways/PATH_GATEWAY.md
Lines 184 to 187 in 67fab21
- [application/vnd.ipld.dag-json](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-json) – requests [IPLD Data Model](https://ipld.io/docs/data-model/) representation serialized into [DAG-JSON format](https://ipld.io/docs/codecs/known/dag-json/) | |
- [application/vnd.ipld.dag-cbor](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-cbor) – requests [IPLD Data Model](https://ipld.io/docs/data-model/) representation serialized into [DAG-CBOR format](https://ipld.io/docs/codecs/known/dag-cbor/) | |
- [application/json](https://www.iana.org/assignments/media-types/application/json) – same as `application/vnd.ipld.dag-json`, unless the CID's codec is JSON. Then, the raw JSON block can be returned | |
- [application/cbor](https://www.iana.org/assignments/media-types/application/cbor) – same as `application/vnd.ipld.dag-cbor`, unless the CID's codec is CBOR. Then, the raw CBOR block can be returned |
} representation tuple | ||
``` | ||
|
||
The CID for the DMT of this schema is `bafyreibvheoym4avfsjfw63yhsymovm7o54ftcnxwxovqf5xxcbjddanze` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: was unable to inspec this via ipfs dag get --output-codec=dag-json bafyreibvheoym4avfsjfw63yhsymovm7o54ftcnxwxovqf5xxcbjddanze | jq
As a rule of thumb, CIDs used in IPIP should be publicly available and pinned (e.g. to https://estuary.tech and https://web3.storage, do not use Pinata as afaik it does not announce CIDs on DHT).
We will have automation for this, btu for now it is up to IPIP author to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I maybe include some CBOR files with the fixtures that are relevant to the spec?
For example, given the following schema (note it is written in DSL form, but must be converted to the DMT in order to be refernced): | ||
|
||
```ipldschema | ||
type Example struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I've read this section and tbh have no idea what is the value to end user – CBOR traversal and field resolution with extra steps so the output looks a certain way?
In ADL section we have good use case "ADL that's used to represent large maps" – we need similar real world example for schemas.
What would a schema be useful for irl? I feel the spec here needs better Example
, so the value is obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One use case for schemas is to use the representation
functionality to render data in more human/application readable formats from formats that are more size efficient.
e.g. some things might be using a listpairs
representation which would look like an array of arrays by default. But with a schema you can transform the representation to be more human readable.
I'm gonna be doing stuff along this line for the Prolly Tree work where we'll be encoding tree nodes more efficiently, but having a way to put them through a schema before the application code starts working with them.
Co-authored-by: Marcin Rataj <[email protected]>
Co-authored-by: Marcin Rataj <[email protected]>
Co-authored-by: Marcin Rataj <[email protected]>
Co-authored-by: Marcin Rataj <[email protected]>
Co-authored-by: Marcin Rataj <[email protected]>
Co-authored-by: Marcin Rataj <[email protected]>
Co-authored-by: Marcin Rataj <[email protected]>
@darobin would also appreciate help on this spec since it's relevant to a bunch of IPLD stuff I'd like to surface. Probably lower priority than the writable gateway stuff. |
2022-11-15 IPLD Triage conversation: We're still a ways off on this. This likely wouldn't be a candidate for merge until 2023Q1. @RangerMauve wants to:
|
@RangerMauve just curious what's the deal with trustlessness and multi block retrievals here? It seems like all of Accept formats are dag-json/dag-cbor, but that implies a single block response. So that means this is a trust based protocol I think? (since anything other than root cid is not verifiable) |
@hannahhoward Yes, this spec currently relies on the same trustful semantics as the IPFS gateway for loading content. I think if somebody wants to have trustlessness, then downloading CARs and doing traversal / verification at the application level would be the way to go. I could see there being some more complex API endpoints which could yield CARs with proofs of everything, but tbh I think it'd be overkill unless there's a specific use case folks had in mind. One goal of specifying stuff in terms of URLs and methods is that this could be abstracted over where the "backend" could literally a backed that runs this gateway, or a library could implement these things using a light IPFS node which just does bitswap with a trustless gateway and does the validation and traversal client-side, or it could be running along side a full local node that does all the p2p bits as well. |
So, I've been messing with this some more. Some updates: I've been thinking that instead of using querystring parameters for applying parameters to the root, we could store all that in the hostname. The main reason is that it keeps the parameters being applied to a node beside the node itself, and applying different ADLs/Schemas on something feels like it should modify the "origin" due to the added transformations. It should also make it easier to create relative URLs. This means that before a URL might have looked like
Now it'd look like
As well, I'm thinking of giving the ending My proposal is to treat a lack of a trailing So |
Sadly I'm not working for the IPLD team anymore, but I recently added the new IPLD Patch stuff and advanced pathing into the latest version of Agregore. Overall I'm really happy with the ergonomics of applying lenses and patching over them using a simple declarative (and deterministic!) interface. https://github.com/AgregoreWeb/agregore-browser/releases/tag/v2.1.0 |
WIP: Drafting up specs for
/ipld/
support in gateways