Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-378: Delegated Routing HTTP POST API #378

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 103 additions & 3 deletions routing/DELEGATED_CONTENT_ROUTING_HTTP.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,13 +66,13 @@ Specifications for some transfer protocols are provided in the "Transfer Protoco

### `GET /routing/v1/providers/{CID}`

#### Response codes
#### `GET` Response codes

- `200` (OK): the response body contains 0 or more records
- `404` (Not Found): must be returned if no matching records are found
- `422` (Unprocessable Entity): request does not conform to schema or semantic constraints

#### Response Body
#### `GET` Response Body

```json
{
Expand All @@ -90,6 +90,48 @@ Response limit: 100 providers

Each object in the `Providers` list is a *read provider record*.

### `PUT /routing/v1/providers`

#### `PUT` Response codes

- `200` (OK): the server processed the full list of provider records (possibly unsuccessfully, depending on the semantics of the particular records)
- `400` (Bad Request): the server deems the request to be invalid and cannot process it
- `422` (Unprocessable Entity): request does not conform to schema or semantic constraints
- `501` (Not Implemented): the server does not support providing records

#### `PUT` Request Body

```json
{
"Providers": [
{
"Protocol": "<protocol_name>",
"Schema": "bitswap",
...
}
]
}
```

Each object in the `Providers` list is a *write provider record*.

#### `PUT` Response Body

```json
{
"ProvideResults": [
{ ... }
]
}
```

- `ProvideResults` is a list of results in the same order as the `Providers` in the request, and the schema of each object is determined by the `Protocol` of the corresponding write object (called "Write Provider Records Response" in the Known Transfer Protocols section)
- This may contain output information such as TTLs, errors, etc.
- It is undefined whether the server will allow partial results
- The work for processing each provider record should be idempotent so that it can be retried without excessive cost in the case of full or partial failure of the request
- Default limit of 100 keys per request
- Implements pagination according to the Pagination section
masih marked this conversation as resolved.
Show resolved Hide resolved

## Pagination

This API does not support pagination, but optional pagination can be added in a backwards-compatible spec update.
Expand Down Expand Up @@ -118,7 +160,7 @@ limits, allowing every site to query the API for results:

```plaintext
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, OPTIONS
Access-Control-Allow-Methods: GET, PUT, OPTIONS
```

## Known Transfer Protocols
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 this section grows way faster than the rest of the spec.

Would it be ok to move "Known Transfer Protocols" to a separate file (KNOWN_TRANSFER_PROTOCOLS.md)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend doing this in a separate PR to reduce lines changes what are irrelevant to what this PR is trying to introduce.

If this is not a blocker for this PR, I'd be happy to take this as a TODO and get it done once this PR is merged.

Expand Down Expand Up @@ -148,6 +190,60 @@ Specification: [ipfs/specs/BITSWAP.md](https://github.com/ipfs/specs/blob/main/B

The server should respect a passed `transport` query parameter by filtering against the `Addrs` list.

#### Bitswap Write Provider Records
lidel marked this conversation as resolved.
Show resolved Hide resolved

```json
{
"Protocol": "transport-bitswap",
"Schema": "bitswap",
"Signature": "<signature>",
"Payload": "<payload>"
Copy link
Member

@lidel lidel May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 this creates one bitswap schema for GET and the other bitswap for PUT. Both under the same name, but with different fields. Bit nasty and error prone.

@masih is this ossified, or is there still time to fix this and use different schema for PUTs?

Suggested change
"Schema": "bitswap",
"Signature": "<signature>",
"Payload": "<payload>"
"Schema": "bitswap-announce",
"Signature": "<signature>",
"Payload": "<payload>"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still time. Changing thanks

}
```

- `Signature`: a multibase-encoded signature of the sha256 hash of the `Payload` field, signed using the private key of the Peer ID specified in the `Payload` JSON. Signing details for specific key types should follow [libp2p/peerid specs](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#key-types), unless stated otherwise.
lidel marked this conversation as resolved.
Show resolved Hide resolved

- Servers may ignore this field if they do not require signature verification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masih is cid.contact doing verification of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cid.contact does not support any PUT over http delegated routing.

- `Payload`: a string containing a serialized JSON object which conforms with the following schema:

```json
{
"Keys": ["cid1", "cid2"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masih what is the max length of this list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is to make the maximum length be derived from whatever the server supports as max request body. A user should be able to find this out via OPTIONS request usually?

If that makes sense I am happy to document this in the spec. Unless you think we need some explicit max length for that array?

"Timestamp": 0,
"AdvisoryTTL": 0,
"ID": "12D3K...",
"Addrs": ["/ip4/..."]
}
```

- `Keys` is a list of the CIDs being provided
- `Timestamp` is the current time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the format? Unix epoch? What resolution (ms? ns?)

Would ASCII string that follows notation from [rfc3339] (1970-01-01T00:00:00.000000001Z) be less ambiguous?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on following standardized format that is easily human-parseable and doesn't have ambiguity.

- `AdvisoryTTL` is the time by which the caller expects the server to keep the record available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a timestamp of a duration? If a duration, is it miliseconds?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a duration, maybe get that into the name.
Also embed the units in the parameter name so there is no ambiguity?

- If this value is unknown, the caller may use a value of 0
- `ID` is the peer ID that was used to sign the record
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 there are 3 peerid types, two of them are legacy (see the end of this spec).

I'd like to avoid legacy from /routing/v1 and use CIDv1 with libp2p-key codec everywhere. @masih any concerns around this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No concerns 👍

- `Addrs` is a list of string-encoded multiaddrs

A [400 Bad Request](https://httpwg.org/specs/rfc9110.html#status.400) response code should be returned if the signature check fails.

Note that this only supports Peer IDs expressed as identity multihashes. Peer IDs with older key types that exceed 42 bytes are not verifiable since they only contain a hash of the key, not the key itself.
Normally, if the Peer ID contains only a hash of the key, then the key is obtained out-of-band (e.g. by fetching the block via IPFS).
If support for these Peer IDs is needed in the future, this spec can be updated to allow the client to provide the key and key type out-of-band by adding optional `PublicKey` and `PublicKeyType` fields, and if the Peer ID is a CID, then the server can verify the public key's authenticity against the CID, and then proceed with the rest of the verification scheme.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we don't include PublicKey field from the start?
If we make it an optional, opaque bytes field, and say it should be deserialized as the protobuf from libp2p peerid specs, we have both solved RSA problem, and don't need to maintain PublicKeyType registry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could include PublicKey in the peer schema on out-of-band GET to /routing/v1/peers/{peerid} (IPIP-417).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with either of these paths.
@masih are you okay with this suggestion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me 👍


The `Payload` field is a string, not a proper JSON object, to prevent its contents from being accidentally parsed and re-encoded by intermediaries, which may change the order of JSON fields and thus cause the record to fail validation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have to mangle Payload JSON anyway, what is the benefit of introducing it as a new struct?
It was probably discussed before elsewhere, but why can't we use exactly the same protobuf and signature DHT uses?

Reusing DHT record payload would save CPU time on IPFS nodes that publish to both DHT and IPNI (signature would be generated only once, not twice).

Copy link
Member Author

@masih masih Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the benefit of introducing it as a new struct?

I suspect the reason for this is better human readability? The beginning of this document (merged as part of IPIP-337 states "human-readable encodings of types are preferred".

save CPU time on IPFS nodes

We can make the same argument about HTTP delegated routing itself. Unless there is evidence to suggest that we should be heavily optimising this, my vote would be to favour human readability.

@guseggert what are your thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing DHT record payload

@lidel can I ask where I can find the DHT record payload specification? I think we can add this to the spec via a specific request Content-Type media type, similar to #379.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guillaumemichel this is related to #345 – can you point us where up-to-date protobufs related to CID announcements on DHT lives these days? Unsure how feasible it is to reuse the announcement's wire format here.

Could we write down and publish the basic wire format information like protobuf section about ipns record, so this IPIP can refer to spec website, and not some go code 🙏


@masih I think Content-Type reuse similar to what we did #379 would help in some cases, but not a silver bullet here. There is way way more CIDs than IPNS records, so some way of doing bulk PUT is still desired, but it should not be tied to a single protocol.

When IPFS node speaks more than bitswap (we already plan to do HTTP next to bitswap in many places), doing a PUT of the same CIDs once for every protocol duplicates the cost on the client.

I don't think there is anything special about Schema: bitswap-announce, having dedicated PUT for this one protocol feels odd, PUT for HTTP will look exactly the same.

Maybe instead of having Schema: bitswap-announce we should have Schema: provider-announce with Protocols list, which could then have both bitswap and HTTP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having dedicated PUT for this one protocol feels odd, PUT for HTTP will look exactly the same

I agree 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After IPIP-417 we are in a better place now.

I think we should avoid protocol-specific schema for bitswap and use peer schema from IPIP-417. It allows for passing optional protocol-specific metadata for things other than bitswap, allowing client to do single PUT with all info.


#### Write Provider Records Response

```json
{
"AdvisoryTTL": 0
}
```

- `AdvisoryTTL` is the time at which the server expects itself to drop the record
- If less than the `AdvisoryTTL` in the request, then the client should re-issue the request by that point
- If greater than the `AdvisoryTTL` in the request, then the server expects the client to be responsible for the content for up to that amount of time (TODO: this is ambiguous)
- If 0, the server makes no claims about the lifetime of the record

### Filecoin Graphsync

Multicodec name: `transport-graphsync-filecoinv1`
Expand All @@ -173,3 +269,7 @@ Specification: [ipfs/go-graphsync/blob/main/docs/architecture.md](https://github
- `PieceCID`: the CID of the [piece](https://spec.filecoin.io/systems/filecoin_files/piece/#section-systems.filecoin_files.piece) within which the data is stored
- `VerifiedDeal`: whether the deal corresponding to the data is verified
- `FastRetrieval`: whether the provider claims there is an unsealed copy of the data available for fast retrieval

#### Filecoin Graphsync Write Provider Records

There is currently no specified schema.