-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: align aggregation namings from commp learnings #62
fix: align aggregation namings from commp learnings #62
Conversation
w3-aggregation.md
Outdated
@@ -127,20 +127,23 @@ type AggregateOffer struct { | |||
|
|||
type AggregateOfferDetail struct { | |||
offer OfferCBOR | |||
commitmentProof Proof | |||
segment Proof |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference betrween segment and piece?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
segment is commD (aggregate), in other words it is computed by a tree of commP (pieces)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided bunch of feedback. I do however feel a gap in my understanding of frc-0058, which in turn might be leading to invalid suggestions here.
That said I think it is worth doing these changes now and iterating again once we gain better understanding.
w3-aggregation.md
Outdated
@@ -127,20 +127,23 @@ type AggregateOffer struct { | |||
|
|||
type AggregateOfferDetail struct { | |||
offer OfferCBOR | |||
commitmentProof Proof | |||
segment Link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm misunderstanding something, we also need a size, which can be derived from the pieces in the offer but seems like it would be a good idea to have it captured in the separate field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good idea to define a type alias for the Segment or SegmentLink
with some comments on what is it a link for etc... I'm thinking something along the lines of
# Segment a set of Piece CIDs
# @see https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0058.md
type Segment [&Piece]
# ....
type SegmentLink = &Segment
w3-aggregation.md
Outdated
@@ -127,20 +127,23 @@ type AggregateOffer struct { | |||
|
|||
type AggregateOfferDetail struct { | |||
offer OfferCBOR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offer OfferCBOR | |
offer &Offer |
I think it's actually a link to a Offer block isn't it ?
w3-aggregation.md
Outdated
link Link | ||
src [URL] | ||
commitmentProof Proof | ||
type Piece { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should model this after PieceInfo where size
is in number of leafs as opposed to bytes. It would be more aligned and use less space
We derive byte size from leaf size anyway https://github.com/web3-storage/data-segment/blob/509bb3d82aaf31ddc31a03acc6305f7c42310c47/src/commp.js#L74-L76
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good yes! Good suggestion
w3-aggregation.md
Outdated
|
||
type struct ContentPiece { | ||
piece Piece | ||
link Link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not saying we should remove this, but I'm gettin an impression that Spade does not care about this. In a sense piece.link
is the a link to the content archive but in another (CAR derived) format.
I think it would be good to add a code comment here clarifying what is this field used for, because I no longer can recall that detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the CAR CID. With pieceCid
we do NOT need it, nor spade as far as I can tell. However, being able to have CAR cid available in the invocation can likely help us rather than needing to query every single pieceCid to get which CAR it comes from. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the same make sense for the commD
part?
w3-aggregation.md
Outdated
type struct ContentPiece { | ||
piece Piece | ||
link Link | ||
src? [URL] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean optional field, if so below is how you'd spell it in IPLD schema syntax
src? [URL] | |
src optional [URL] |
If you did intend to make it optional, I would highly encourage capturing rational in the code comment, specifically outlining what does omitting it entails.
I personally dislike double optionality as it creates opportunity for divergence. That is to say empty array/list already implies optionality as it can be empty.
🤔 I have also been wondering if we putting a URL to the content is a good idea at all. From my recollection, of the conversation we had at IPFS Thing, we may want to retain flexibility to generate short lived URLs when providers are trying to get this content and if so it may be better to leave it out.
I have also had been wondering if mapping between commP <-> carCID
should be more of content claim, not to mention that we are already gearing up to map carCID <-> URL[]
as content location claim
To be more concrete, I think it would really help to write down purpose of each field and how it's intended use that will help us evaluate it against content claims etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have it described in the text for that invocation, more specifically
Note that src field is optional and can be provided in a different part of the flow such as when deal is signed or through a previously agreed API.
we decided to start by going this way to avoid the need to a new interaction for MVP. Also, considering we will have a redirect service for signed URLs, that will be usable here.
I think keeping the flexibility here of providing short lived URLs rather than relying on content location claims makes more sense. Otherwise, we will need to get SPs hooked up with our content location claims, which will take quite a while...
w3-aggregation.md
Outdated
@@ -179,7 +184,7 @@ Invoking `aggregate/offer` capability submits an aggregate to a broker service f | |||
] | |||
``` | |||
|
|||
Each entry of the decoded offers block, has all the necessary information for a Storage Provider to fetch and store a CAR file. The `link` field has the CAR CID, while the `commitmentProof` field has the required `proof` bytes by Storage Providers (for example, `commP`). The `size` field MUST be set to the byte size of the CAR file. The `src` field of each piece MUST be set to a (alphabetically sorted) list of URLs from which it can be fetched. Note that `src` field is optional and can be provided in a different part of the flow such as when deal is signed or through a previously agreed API. | |||
Each entry of the decoded offers block, has all the necessary information for a Storage Provider to fetch and store a CAR file. The `link` field has the CAR CID, while the `piece` field has the `proof` required by Storage Providers (for example, `commP`). The `src` field of each piece MUST be set to a (alphabetically sorted) list of URLs from which it can be fetched. Note that `src` field is optional and can be provided in a different part of the flow such as when deal is signed or through a previously agreed API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer accurate as proof
field is gone. I think whole thing should be reframed as mapping between content source
(in CAR format) and a corresponding filecoin piece
info.
I am less sure about source location (src) at this point, and it starts to feel like a bad idea. Seems to me that putting a routing/redirecting endpoint would buy storefront a lot more flexibility yet not cost too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am less sure about source location (src) at this point, and it starts to feel like a bad idea. Seems to me that putting a routing/redirecting endpoint would buy storefront a lot more flexibility yet not cost too much.
The main motivation to have it there is still the case by what we decided with Mikeal. To decrease need to do any IO operations other than reading the CAR. Anyway, we can discuss that maybe in a separate issue outside of the commp/commd context to avoid noise here?
w3-aggregation.md
Outdated
@@ -353,25 +358,27 @@ type AggregateOffer struct { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be a good idea to reference types from the prior IPLD schema instead of trying to keeping two in sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep the full schema below and reference on the upper part? of the other way around?
I think it makes more sense to me to have the full schema in the end, but I added its view above based on previous reviews for initial PR
9b4657f
to
8072c05
Compare
@Gozala did a new iteration here based on our sync yesterday. More specifically, moved away from segment name as it does not map with spade schema (and also by the call felt like segment naming should be avoided). Spade ends up calling Ended up with
@Gozala would love your opinion on just using Based on our discussion + review, looks like we:
|
8072c05
to
060181a
Compare
a89c3d6
to
903894e
Compare
@Gozala as we talked yesterday changed:
Now we just need to flush out the Car CID. Either w3s has responsability to expose an API that by individual |
As talked with @Gozala out of band, we removed car CID from invocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should land this (as it better reflects our current thinking) and iterate more in followups.
|
||
type Offer [OfferDetails] | ||
``` | ||
A Storefront principal can invoke a capabilty to offer an aggregate that is ready to be included in Filecoin deal(s). See [schema](#aggregateoffer-schema). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give that filecoin spec refers to these actors as Aggregators
it may be better idea to use the same term as them as opposed to Storefront
, which I suggested before I was familiar with the preexisting terminology.
"nb": { | ||
"offer": { "/": "bafy...many-cars" }, /* dag-cbor CID */ | ||
"commitmentProof": { "/": "commitment...cars-proof" } /* commitment proof */ | ||
"offer": { "/": "bafy...many-cars" }, /* dag-cbor CID with offer content */ | ||
"piece": { | ||
"link": { "/": "commitment...aggregate-proof" }, | ||
"size": 10102020203013342343 | ||
} /* commitment proof for aggregate */ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given all the fields that we've dropped I wonder if we should just flatten things up and have this more like
"nb": { | |
"offer": { "/": "bafy...many-cars" }, /* dag-cbor CID */ | |
"commitmentProof": { "/": "commitment...cars-proof" } /* commitment proof */ | |
"offer": { "/": "bafy...many-cars" }, /* dag-cbor CID with offer content */ | |
"piece": { | |
"link": { "/": "commitment...aggregate-proof" }, | |
"size": 10102020203013342343 | |
} /* commitment proof for aggregate */ | |
} | |
"nb": { | |
"offer": { "/": "bafy...offer" }, /* embedded with invocation */ | |
} |
Where bafy...offer
is CBOR of structure like:
{
"link": { "/": "offer...commp" }
"size": 1010101
"pieces": [
{
"link": { "/": "car...commp" },
"size": 1010
},
// ....
]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant to voice this opinion, yet I wonder if we should omit the size
which I believe could be determinstically derived from the pieces
. Unless I'm mistaken existing libraries require to specify size
(on aggregates specifically) because the remaining space could be filled with 0
s. However it creates space for an error where you may have two different aggregates for exact same pieces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that this is not directly compatible with UCAN LOG. When we want to create metrics or to run consumers based on invocations/receipts of these invocations we would need to do extra ops to read block from CAR file. We could make the block as you suggest, which would probably be the exact content we send to Spade in the end, but I would still keep link
and size
in the nb
field so that we can easily track things from the log without a layer of indirection
}, | ||
{ | ||
/* ... */ | ||
} | ||
] | ||
``` | ||
|
||
Each entry of the decoded offers block, has all the necessary information for a Storage Provider to fetch and store a CAR file. The `link` field has the CAR CID, while the `commitmentProof` field has the required `proof` bytes by Storage Providers (for example, `commP`). The `size` field MUST be set to the byte size of the CAR file. The `src` field of each piece MUST be set to a (alphabetically sorted) list of URLs from which it can be fetched. Note that `src` field is optional and can be provided in a different part of the flow such as when deal is signed or through a previously agreed API. | ||
Each entry of the decoded offers block, has all the necessary information for a Storage Provider to fetch and store a CAR file. It includes an array of Filecoin `piece` info required by Storage Providers. Out of band, Storefront will provide to Storage Providers a `src` HTTP URL to each CAR file in the offer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think "Out of band, ...." is somewhat misleading here. Perhaps document should say HTTP URLs will be issued during deal signing.
Aside: I have also been thinking that we may want to delegate UCAN that can be used for fetching aggregate pieces to the actor we sign with a deal with. That would prevent other actors from using that URL without sharing a private key or a UCAN delegation. It could also allow us to defer the pre-signed URL creation up until actor decides to fetch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also wondering if aggreate/offer
should be delegating capability to a "broker" to issue signed URLs for the aggregate pieces. That would allow broker to do it as needed and perhaps avoid back & forth. I realize we would still need to sign the deal with our key, but if delegated signatures were an option we could probably even get away from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have started a thread on this line of thought here https://filecoinproject.slack.com/archives/C05C7CUPEKX/p1687366781115129
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the discussion in the linked thread. I think we should reframe "spade-proxy" as an "agency" to which "aggregators" can submit (offer) aggregates along with all the UCAN delegations that would allow it to:
- Create HTTP read URLs for the aggregate pieces.
- Sign deals with "storage providers" on behalf of the aggregator.
That would I think simplify things for both "aggregator" and "agency" (spade-proxy) reducing coordination between them as "agency" would be able to take the aggregate and either get it into a filecoin.
@vasco-santos I think this also could remove for mapping CAR CID <-> Piece CID, instead pieces could come with associated UCANs, allow agency to create HTTP URLs for them. Aggregator would still need to map CARs to pieces to surface corresponding filecoin state, which I think makes sense, it's like a reference to filecoin state.
I think maybe we should embrace new UCAN representation and also embrace upcoming invocation spec. With the above here is my attempt at the schema with all the things I've mentioned
|
Co-authored-by: Irakli Gozalishvili <[email protected]>
@Gozala adopted the easies changes here and going to merge and iterate on the remaining discussions in issues / new PR |
Context storacha/w3up#787 (comment)
cc @ribasushi