-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add IPNI spec #85
base: main
Are you sure you want to change the base?
Changes from all commits
720348a
05caee2
f01552e
8e5ee5a
458f660
e9ad459
f3922bc
ad9e329
2c212e6
3bc0977
40fb998
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,246 @@ | ||||||||||
# W3 IPNI Protocol | ||||||||||
|
||||||||||
![status:wip](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) | ||||||||||
|
||||||||||
## Authors | ||||||||||
|
||||||||||
- [olizilla], [Protocol Labs] | ||||||||||
|
||||||||||
## Abstract | ||||||||||
|
||||||||||
For [IPNI] we assert that we can provide batches of multihashes by signing "Advertisements". | ||||||||||
|
||||||||||
With an [inclusion claim], a user asserts that a CAR contains a given set of multihashes via a car index. | ||||||||||
|
||||||||||
This spec describes how to merge these two concepts by adding an `ipni/offer` capability to publish an inclusion claim as [IPNI Advertisements]. | ||||||||||
|
||||||||||
## Language | ||||||||||
|
||||||||||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119](https://datatracker.ietf.org/doc/html/rfc2119). | ||||||||||
|
||||||||||
## Introduction | ||||||||||
|
||||||||||
We publish ad-hoc batches of multihashes to IPNI. This proposal aims to align our usage of IPNI with [content-claims], by publishing an advert per [inclusion claim], and include the source claim in the IPNI Advertisement | ||||||||||
|
||||||||||
### Motivation | ||||||||||
|
||||||||||
- Align IPNI advert entries with CAR block sets and setting the `ContextID` to be the CAR CID. | ||||||||||
- This exposes our block-to-car indexes. Anyone could use IPNI to find which CAR a block is in. The context id bytes provide the CAR CID for any block look up. The CAR CID can then be used to find the CAR index via our content-claims API. | ||||||||||
- We could delete the IPNI records by CAR CID if the CAR is deleted. | ||||||||||
- Make IPNI advertising an explicit UCAN capability that clients can invoke rather than a side-effect of bucket events | ||||||||||
- With this we are free to write CARs anywhere. The users agent invokes a `ipni/offer` capability to ask us to publish and IPNI ad for the blocks in their CAR. | ||||||||||
- This empowers the user to opt-in or out as they need, and allows us to bill for the (small) cost of running that service. | ||||||||||
- Put the source inclusion claim in the IPNI advert metadata. | ||||||||||
- We have to sign IPNI Adverts as the provider. Providing a signed source claim allows more nuanced reputation decisions. | ||||||||||
|
||||||||||
### Quick IPNI primer | ||||||||||
|
||||||||||
IPNI ingests and replicates billions of signed provider claims for where individual block CIDs can be retrieved from. | ||||||||||
|
||||||||||
Users can query IPNI servers for any CID, and it provides a set of provider addresses and transport info, along with a provider specific `ContextID` and optional metadata. | ||||||||||
|
||||||||||
For example: <http://cid.contact> hosts an IPNI server that Protocol Labs maintains. | ||||||||||
|
||||||||||
_Query IPNI for a cid_ | ||||||||||
Check failure on line 44 in w3-ipni.md GitHub Actions / markdown-link-checkEmphasis used instead of a heading [Context: "Query IPNI for a cid"]
Check failure on line 44 in w3-ipni.md GitHub Actions / spellcheckMisspelled word
|
||||||||||
|
||||||||||
```bash | ||||||||||
curl https://cid.contact/cid/bafybeicawc3qwtlecld6lmtvsndimoz3446xyaprgsxvhd3aapwa2twnc4 -sS | jq | ||||||||||
``` | ||||||||||
|
||||||||||
```json | ||||||||||
{ | ||||||||||
"MultihashResults": [ | ||||||||||
{ | ||||||||||
"Multihash": "EiBAsLcLTWQSx+WydZNGhjs75z18AfE0r1OPYAPsDU7NFw==", | ||||||||||
"ProviderResults": [ | ||||||||||
{ | ||||||||||
"ContextID": "YmFndXFlZXJheTJ2ZWJsZGNhY2JjM3Z0em94bXBvM2NiYmFsNzV3d3R0aHRyamhuaDdvN2o2c2J0d2xmcQ==", | ||||||||||
"Metadata": "gBI=", | ||||||||||
"Provider": { | ||||||||||
"ID": "QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC", | ||||||||||
"Addrs": [ | ||||||||||
"/dns4/elastic.dag.house/tcp/443/wss" | ||||||||||
] | ||||||||||
} | ||||||||||
}, | ||||||||||
{ | ||||||||||
"ContextID": "YmFndXFlZXJheTJ2ZWJsZGNhY2JjM3Z0em94bXBvM2NiYmFsNzV3d3R0aHRyamhuaDdvN2o2c2J0d2xmcQ==", | ||||||||||
"Metadata": "oBIA", | ||||||||||
"Provider": { | ||||||||||
"ID": "QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp", | ||||||||||
"Addrs": [ | ||||||||||
"/dns4/dag.w3s.link/tcp/443/https" | ||||||||||
] | ||||||||||
} | ||||||||||
} | ||||||||||
]}]} | ||||||||||
``` | ||||||||||
|
||||||||||
[web3.storage] publishes the blocks it can provide by encoding a batch of multihashes as an IPLD object and writing it to a bucket as an `Advertisement`, addressed by it's CID. | ||||||||||
|
||||||||||
An `Advertisement` includes `Provider` info which claims that the batch of multihashes are available via bitswap or HTTP, and are signed by the providers PeerID private key; Each advert is a claim that this peer will provide that batch of multihashes. | ||||||||||
|
||||||||||
Advertisements also include a CID link to any previous ones from the same provider forming a hash linked list. | ||||||||||
|
||||||||||
The latest `head` CID of the advert list can be broadcast over [gossipsub], to be replicated and indexed by all listeners, or sent via HTTP to specific IPNI servers as a notification to pull and index the latest ads from you at their earliest convenience. | ||||||||||
|
||||||||||
The advert `ContextID` allows providers to specify a custom grouping key for multiple adverts. You can update or remove multiple adverts by specifying the same `ContextID`. The value is an opaque byte array as far as IPNI is concerned, and is provided in the query response. | ||||||||||
|
||||||||||
A `Metadata` field is also available for provider specific retrieval hints, that a user should send to the provider when making a request for the block, but the mechanism here is unclear _(HTTP headers? bitswap?)_. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be good to put a URL like You can also put a multiaddr in there but IMO it would be nice for it to be represented as a URI https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Metadata field here is at the Advertisement level, so it's about an arbitrary batch of CIDs rather than one. We could potentially put the a gateway url for the carCID at that level. The proposal here is to use this space for the bytes of the inclusion claim. In IPNI today, an Advertisement maps an EntryChunk CID (aka a CID for a batch of multihashes) to a Provider. A Provider is an array of multiaddrs, that define the location and transport^ to use to fetch them. A URL would be nice, but PL-ware is multiaddrs everywhere. ^But, gotcha! right now specifying a multiaddr with "http" as a transport isn't enough to say "pls use trustless ipfs gateway semantics when retreving over http", so the Metadata field is provided to give more hints that can't yet be inferred from the multiaddr alone... Bitswap has a similar issue... the multiaddr would specify "p2p" as the transport, but then libp2p protocol negotiation is left to the peers to decide on. The Metadata field again has a varint to declare Bitswap should be expected. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Transforming multiaddr to URL is simple enough. Will the client be querying IPNI directly, or will some lookup service be doing that and then generating a possibly signed URL that the client can use to get the data? |
||||||||||
|
||||||||||
Regardless, it is a field we can use to include the portable cryptographic proof of the content-claim that an end-user made that a set of blocks are included in a CAR. The provider has to sign the IPNI advert with the peerID key that should be used to secure the libp2p connection when retrieving the block. For upload services like web3.storage, | ||||||||||
Check failure on line 91 in w3-ipni.md GitHub Actions / markdown-link-checkTrailing spaces [Expected: 0 or 2; Actual: 1]
Check failure on line 91 in w3-ipni.md GitHub Actions / spellcheckMisspelled word
|
||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The functionality that is needed here is for the data owner to assert that the advertised CIDs are in the CAR file that is referred to by the context ID. In other words, this advertisement is correct for the CAR file. The advertisement is signed by the provider of the advertisement (w3s?), so the functionality here is adding the data owner's signature to the advertisement. The inclusion claim carried by the metadata is limited to specifying a CAR index (by CID) that is associated with the CAR CID |
||||||||||
|
||||||||||
### How web3.storage integrates IPNI today | ||||||||||
|
||||||||||
web3.storage publishes IPNI advertisements as a side-effect of the E-IPFS car [indexer-lambda]. | ||||||||||
|
||||||||||
Each multihash in a CAR is sent to an SQS queue. The `publisher-lambda` takes batches from the queue, encodes and signs `Advertisement`s and writes them to a bucket as JSON. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Aside: How does publisher lambda knowns what was the previous head to link new advertisement to it ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it writes a |
||||||||||
|
||||||||||
The lambda makes an HTTP request to the IPNI server at `cid.contact` to inform it when the head CID of the Advertisement linked list changes. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Aside: Does IPNI uses IPFS to get advertisements by following the log from the published head ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, it uses HTTP. https://github.com/ipni/specs/blob/main/IPNI.md#advertisement-transfer We tell it the CID of the new head, we write files with the CID as the filename. There is also a pre agreed |
||||||||||
|
||||||||||
The IPNI server fetches new head Advertisement from our bucket, and any others in the chain it hasn't read yet, and updates it's indexes. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wait they read from the bucket directly not through some more generic interface ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||
|
||||||||||
Our `Advertisement`s contain arbitrary batches of multihashes defined by SQS queue batching config. The `ContextID` is set to opaque bytes (a custom hash of the hashes). | ||||||||||
|
||||||||||
#### Diagram | ||||||||||
|
||||||||||
```mermaid | ||||||||||
flowchart TD | ||||||||||
A[(dotstorage\nbucket)] -->|ObjectCreated fa:fa-car| B(bucket-to-indexer ƛ) | ||||||||||
B -->|region/bucket/cid/cid.car| C[/indexer queue/] | ||||||||||
C --> indexer(Indexer ƛ) | ||||||||||
indexer --> |zQmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn| E[/multihash queue/] | ||||||||||
E --> F(ipni Advertisement content ƛ) | ||||||||||
F --> |PUT /advertCid|I | ||||||||||
F --> |advert CID| G[/Advertisement queue/] | ||||||||||
G --> H(ipni publish ƛ) | ||||||||||
H --> |PUT /head|I[(Advert Bucket)] | ||||||||||
H --> |POST head|IPNI[["`**IPNI**`"]] | ||||||||||
|
||||||||||
carpark[(carpark\nbucket)] --> |ObjectCreated fa:fa-car|w3infra-carpark-consumer(carpark-consumer ƛ) | ||||||||||
w3infra-carpark-consumer -->|region/bucket/cid/cid.car| C[/indexer queue/] | ||||||||||
|
||||||||||
indexer ---> dynamo[Dynamo\nblocks index] | ||||||||||
``` | ||||||||||
|
||||||||||
## Proposal | ||||||||||
|
||||||||||
Provide a `ipni/offer` UCAN ability to sign and publish an IPNI Advertisement for the set of multihashes in a CAR a user has stored with w3s, to make them discoverable via IPFS implementations and other IPNI consumers. | ||||||||||
|
||||||||||
```mermaid | ||||||||||
sequenceDiagram | ||||||||||
actor Alice | ||||||||||
Alice->>w3s: ipni/offer (inclusion proof) | ||||||||||
activate w3s | ||||||||||
w3s-->>w3s: fetch & verify index | ||||||||||
w3s-->>w3s: write advert | ||||||||||
w3s-->>Alice: OK (advertisement CID) | ||||||||||
w3s-->>ipni: publish head (CID) | ||||||||||
deactivate w3s | ||||||||||
ipni-->>w3s: fetch advert | ||||||||||
activate ipni | ||||||||||
ipni-->>ipni: index entries | ||||||||||
deactivate ipni | ||||||||||
Alice->>ipni: query (CID) | ||||||||||
``` | ||||||||||
|
||||||||||
Invoke it with the CID for an [inclusion claim] that associates a CAR CID with a [MultihashIndexSorted CARv2 Index] CID. | ||||||||||
|
||||||||||
:::info | ||||||||||
Other CAR index forms may be supported in the future. A more convenient external CAR index format would provide the offset byte and block byte length for a given multihash from the start of the CAR file. | ||||||||||
::: | ||||||||||
|
||||||||||
**UCAN invocation** example | ||||||||||
|
||||||||||
```json | ||||||||||
{ | ||||||||||
"iss": "did:key:zAlice", | ||||||||||
"aud": "did:web:web3.storage", | ||||||||||
"att": [{ | ||||||||||
"can": "ipni/offer", | ||||||||||
"with": "did:key:space", // users space DID | ||||||||||
"nb": { | ||||||||||
"inclusion": CID // inclusion claim CID | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm bit confused is the CID here link to the |
||||||||||
} | ||||||||||
}] | ||||||||||
} | ||||||||||
``` | ||||||||||
|
||||||||||
**Inclusion claim** example | ||||||||||
|
||||||||||
```json | ||||||||||
{ | ||||||||||
"content": CID, // CAR CID | ||||||||||
"includes": CID // CARv2 Index CID | ||||||||||
} | ||||||||||
``` | ||||||||||
|
||||||||||
When `ipni/offer` is invoked the service must fetch the inclusion claim. The encoded claim block may be sent with the invocation. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would make sending it be a requirement instead. Also perhaps we should just stick There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
The service must fetch he CARv2 index and parse it to find the set of multihashes included in the CAR. see: [Verifying the CARv2 Index](#verifying-the-carv2-index) | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm bit confused here fetch from where ? Should not index be send along with the claim ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably not. If the client has already sent the CARv2 index to w3s when the CAR was stored, then the client should not be asked for it again. Also, in this case, the CARv2 index is probably already cached on the w3s service node from when it was stored. Getting the index from storage also means nothing needs to change if we change our decision about having the client create the CARv2 index or having the w3s service create it when the CAR is stored. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. clients currently don't send CAR indexes. |
||||||||||
|
||||||||||
The set of multihashes must be encoded as 1 or more [IPNI Advertisements] per the IPLD Schema: | ||||||||||
|
||||||||||
```ipldsch | ||||||||||
type Advertisement struct { | ||||||||||
PreviousID optional Link | ||||||||||
Provider String | ||||||||||
Addresses [String] | ||||||||||
Signature Bytes | ||||||||||
Entries Link | ||||||||||
ContextID Bytes | ||||||||||
Metadata Bytes | ||||||||||
IsRm Bool | ||||||||||
ExtendedProvider optional ExtendedProvider | ||||||||||
} | ||||||||||
``` | ||||||||||
|
||||||||||
- `Entries` must be the CID of an `EntryChunk` for a subset (or all) of multihashes in the CAR. | ||||||||||
- `ContextID` must be the byte encoded form of the CAR CID. | ||||||||||
Comment on lines
+198
to
+199
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's easier to follow if you mention what context is first as it's referenced from the other field. |
||||||||||
- `Metadata` must be the bytes of the inclusion claim. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a shame to loose the provenance info as in where the claim was originated from, would be nice to capture the source There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also I think it would be nice if I am also very tempted to be storing advertisements in user space as opposed to our own custom bucket, if they delete it we can then publish delete advertisement. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the idea of storing advertisements in user space. That way the user pays to index the advertisements as part of the storage cost. We will need to generate events when a file is deleted to that a removal advertisement can be created. Can a user opt-out of indexing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. General direction I'm advocating for is that if you upload content to our service it just sits there without been indexed or advertised anywhere. If user wants to make it readable they must issue invocation requesting a location claim to be made for the uploaded content, which will in turn index and advertise.
Deletes happen on user invocation which can be a trigger to remove an advertisement. |
||||||||||
|
||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want to include where the content claims for this CAR can be retrieved from? The content claims will include location claims, and there will be a location claim for the CAR file and the CARv2 index. Or should these location claims be encoded into the metadata with an additional location for the location of all the content claims? |
||||||||||
See: [Encoding the IPNI Advertisement](#encoding-the-ipni-advertisement) | ||||||||||
|
||||||||||
The Advertisement should then be available for consumption by indexer nodes per the [Advertisement Transfer](https://github.com/ipni/specs/blob/main/IPNI.md#advertisement-transfer) section of the IPNI spec. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems the advertisement transfer section also has an affordance for submitting advertisements https://github.com/ipni/specs/blob/main/IPNI.md#http-1
So it seems like the client can already send cid.contact the Advertisements directly. How would dev decide whether to to do that or go through this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IPNI Advertisements have to be signed with the PeerID key that should be used to secure the connection when trying to fetch the bytes from the provider. As such an end user can't tell IPNI that web3.storage will provide these blocks, as they can't sign it on our behalf. This is an issue for libp2p based connections where the PeerID is used to authenticate and secure the connection, but not for trustless ipfs gateway flavour http, where tls is used and there is no notion of a PeerID. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gobengo The section you cite above:
Talks about |
||||||||||
|
||||||||||
### Verifying the CARv2 Index | ||||||||||
|
||||||||||
The service must fetch the CARv2 Index and may verify 1 or more multihashes from the index exist at the specified offsets in the associated CAR. | ||||||||||
|
||||||||||
The verifier should pick a set of multihashes at random and fetch the bytes from the CAR identified by the index entry and verify it's multihash. The invocation must return an error if any entry is found to be invalid. | ||||||||||
|
||||||||||
Random validation of a number of blocks allows us to detect invalid indexes and lets us tune how much work we are willing to do per car index. | ||||||||||
|
||||||||||
Full validation of every block is not recommended as it opens us up to performing unbounded work. _We have seen CAR files with millions of tiny blocks._ | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I worry about the probability of considering index valid while in practice it is not the case. It also might be better to validate that random byte offsets in the CAR are correctly indexed as opposed to validating random blocks. 💡🤯Could we actually make clients create advertisements signed with their own keypair and just our addresses so that we don't have to verify and simply enable users to publish to IPNI ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. This is what we all want, but is not an option for libp2p based protocols where the IPNI Advert signer and the provider PeerID must be the same. We could explore it for trustless ipfs gateway providers like w3s.link, but, I'd rather explore a solution that works for both first. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
An advertisement has the concept of a provider and a publisher who publishes ads on behalf of the provider. The publisher ID identifies the host from which advertisements are retrieved. The provider ID identifies the host from which content is retrieved. For now, these are both w3s. IPNI does allow for an advertisement publisher to publish (and sign) ads on behalf of a provider. However, the client is neither the ad publisher nor the content host. This is one of the reasons that the CARv2 index CID is encoded into the ad metadata as an inclusion claim, as that allows the client to effectively sign the advertisement.
Having users publish the advertisements is not practical as that would require the user to maintain an advertisement chain and serve a network endpoint from which indexers can fetch the advertisements. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When I keep say derive keys from user identifier I imply that we could derive private key on demand and consequently could have those synthetic peer's, but I think I'm getting into the weeds here. My main sentiment is that currently we are incentivized to verify claims made by users otherwise our reputation might take hit. This is not a great, ideally we would allow users to advertise claims and let their reputation suffer if those claims appear invalid. There are possibly other ways to accomplish this e.g produce synthetic advertisement keys for |
||||||||||
|
||||||||||
### Encoding the IPNI Advertisement | ||||||||||
|
||||||||||
> The set of multihashes must be encoded as 1 or more [IPNI Advertisements]. | ||||||||||
|
||||||||||
In IPNI, batches of multihashes are encoded as `EntryChunk` blocks, each batch includes an array of multihashes. | ||||||||||
|
||||||||||
A `MultihashIndexSorted` Index encodes a set of multihashes. Mapping from an index to an `EntryChunk` requires parsing the index and encoding the multihashes it contains with the EntryChunk IPLD schema. | ||||||||||
|
||||||||||
```ipldsch | ||||||||||
type EntryChunk struct { | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not in this spec. This is the IPNI vocabulary. https://github.com/ipni/specs/blob/main/IPNI.md#entrychunk-chain There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is baked into the IPNI spec and encoded into existing advertisements. An
The term |
||||||||||
Entries [Bytes] | ||||||||||
Next optional Link | ||||||||||
} | ||||||||||
``` | ||||||||||
|
||||||||||
Where the IPLD encoded size of an `EntryChunk` with the set of multihashes would exceed 4MiB (the upper limit for a block that can be transferred by libp2p) the set of multihashes must be split into multiple `EntryChunk` blocks. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would recommend sorting all entries first before putting them into batches. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why? Depending on the storage backend within IPNI, these do get sorted for sharding or index-based merging, etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For deterministic output. Unless order has some meaning, which I don't believe is the case here keeping things deterministic is usually better. |
||||||||||
|
||||||||||
It is possible to create long chains of `EntryChunk` blocks by setting the `Next` field to the CID to another `EntryChunk`, but this requires an entire EntryChunk to be fetched and decoded, before the IPNI server can determine the next chunk to fetch. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not a problem since indexing is not guaranteed to be immediate, and it is much faster than having the same multihashes split over multiple advertisements. |
||||||||||
|
||||||||||
The containing CAR CID provides a useful `ContextID` for grouping multiple (light weight) Advertisement blocks so it is recommended to split the set across multiple `Advertisement` blocks each pointing to an `EntryChunk` with a partition of the set of multihashes in, and the `ContextID` set to the CAR CID. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I do not understand or agree with this recommendation since reading the same data from several advertisements will be much slower and require much more data transfer than reading from multiple entries blocks in the same advertisement. Advertisements are also chained together with a field that points to the previous advertisement in the chain, so they also need to be decoded and read sequentially. Advertisements carry much more information (context ID, metadata, provider info, signature, etc.), whereas entries blocks contain only the multihashes and a link to the nexd block. |
||||||||||
|
||||||||||
[IPNI]: https://github.com/ipni/specs/blob/main/IPNI.md | ||||||||||
[MultihashIndexSorted CARv2 Index]: https://ipld.io/specs/transport/car/carv2/#format-0x0401-multihashindexsorted | ||||||||||
[content-claims]: https://github.com/web3-storage/content-claims | ||||||||||
[inclusion claim]: https://github.com/web3-storage/content-claims?tab=readme-ov-file#inclusion-claim | ||||||||||
[IPNI Advertisements]: https://github.com/ipni/specs/blob/main/IPNI.md#advertisements | ||||||||||
[gossipsub]: https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/README.md | ||||||||||
[indexer-lambda]: https://github.com/elastic-ipfs/indexer-lambda/blob/a38d8074424d3f02845bac303a0d3fb3719dad82/src/lib/block.js#L22-L32 | ||||||||||
[olizilla]: https://github.com/olizilla | ||||||||||
[Protocol Labs]: https://protocol.ai | ||||||||||
[web3.storage]: https://web3.storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
ContextID
also serves as a key that refers to specific metadata, and is used to update or delete that metadata. Updating metadata changes the metadata returned by IPNI lookups for all CIDs that were advertised with that context ID.