Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: align aggregation namings from commp learnings #62

Merged
merged 9 commits into from
Jun 22, 2023

Conversation

vasco-santos
Copy link
Contributor

@vasco-santos vasco-santos commented Jun 7, 2023

w3-aggregation.md Show resolved Hide resolved
@@ -127,20 +127,23 @@ type AggregateOffer struct {

type AggregateOfferDetail struct {
offer OfferCBOR
commitmentProof Proof
segment Proof
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference betrween segment and piece?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

segment is commD (aggregate), in other words it is computed by a tree of commP (pieces)

Copy link
Collaborator

@Gozala Gozala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provided bunch of feedback. I do however feel a gap in my understanding of frc-0058, which in turn might be leading to invalid suggestions here.

That said I think it is worth doing these changes now and iterating again once we gain better understanding.

@@ -127,20 +127,23 @@ type AggregateOffer struct {

type AggregateOfferDetail struct {
offer OfferCBOR
commitmentProof Proof
segment Link
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm misunderstanding something, we also need a size, which can be derived from the pieces in the offer but seems like it would be a good idea to have it captured in the separate field.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a good idea to define a type alias for the Segment or SegmentLink with some comments on what is it a link for etc... I'm thinking something along the lines of

# Segment a set of Piece CIDs
# @see https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0058.md
type Segment [&Piece]

# ....
type SegmentLink = &Segment

@@ -127,20 +127,23 @@ type AggregateOffer struct {

type AggregateOfferDetail struct {
offer OfferCBOR
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
offer OfferCBOR
offer &Offer

I think it's actually a link to a Offer block isn't it ?

link Link
src [URL]
commitmentProof Proof
type Piece {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should model this after PieceInfo where size is in number of leafs as opposed to bytes. It would be more aligned and use less space

We derive byte size from leaf size anyway https://github.com/web3-storage/data-segment/blob/509bb3d82aaf31ddc31a03acc6305f7c42310c47/src/commp.js#L74-L76

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good yes! Good suggestion


type struct ContentPiece {
piece Piece
link Link
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying we should remove this, but I'm gettin an impression that Spade does not care about this. In a sense piece.link is the a link to the content archive but in another (CAR derived) format.

I think it would be good to add a code comment here clarifying what is this field used for, because I no longer can recall that detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the CAR CID. With pieceCid we do NOT need it, nor spade as far as I can tell. However, being able to have CAR cid available in the invocation can likely help us rather than needing to query every single pieceCid to get which CAR it comes from. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the same make sense for the commD part?

type struct ContentPiece {
piece Piece
link Link
src? [URL]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean optional field, if so below is how you'd spell it in IPLD schema syntax

Suggested change
src? [URL]
src optional [URL]

If you did intend to make it optional, I would highly encourage capturing rational in the code comment, specifically outlining what does omitting it entails.

I personally dislike double optionality as it creates opportunity for divergence. That is to say empty array/list already implies optionality as it can be empty.

🤔 I have also been wondering if we putting a URL to the content is a good idea at all. From my recollection, of the conversation we had at IPFS Thing, we may want to retain flexibility to generate short lived URLs when providers are trying to get this content and if so it may be better to leave it out.

I have also had been wondering if mapping between commP <-> carCID should be more of content claim, not to mention that we are already gearing up to map carCID <-> URL[] as content location claim

To be more concrete, I think it would really help to write down purpose of each field and how it's intended use that will help us evaluate it against content claims etc...

Copy link
Contributor Author

@vasco-santos vasco-santos Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have it described in the text for that invocation, more specifically

 Note that src field is optional and can be provided in a different part of the flow such as when deal is signed or through a previously agreed API.

we decided to start by going this way to avoid the need to a new interaction for MVP. Also, considering we will have a redirect service for signed URLs, that will be usable here.

I think keeping the flexibility here of providing short lived URLs rather than relying on content location claims makes more sense. Otherwise, we will need to get SPs hooked up with our content location claims, which will take quite a while...

@@ -179,7 +184,7 @@ Invoking `aggregate/offer` capability submits an aggregate to a broker service f
]
```

Each entry of the decoded offers block, has all the necessary information for a Storage Provider to fetch and store a CAR file. The `link` field has the CAR CID, while the `commitmentProof` field has the required `proof` bytes by Storage Providers (for example, `commP`). The `size` field MUST be set to the byte size of the CAR file. The `src` field of each piece MUST be set to a (alphabetically sorted) list of URLs from which it can be fetched. Note that `src` field is optional and can be provided in a different part of the flow such as when deal is signed or through a previously agreed API.
Each entry of the decoded offers block, has all the necessary information for a Storage Provider to fetch and store a CAR file. The `link` field has the CAR CID, while the `piece` field has the `proof` required by Storage Providers (for example, `commP`). The `src` field of each piece MUST be set to a (alphabetically sorted) list of URLs from which it can be fetched. Note that `src` field is optional and can be provided in a different part of the flow such as when deal is signed or through a previously agreed API.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer accurate as proof field is gone. I think whole thing should be reframed as mapping between content source (in CAR format) and a corresponding filecoin piece info.

I am less sure about source location (src) at this point, and it starts to feel like a bad idea. Seems to me that putting a routing/redirecting endpoint would buy storefront a lot more flexibility yet not cost too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am less sure about source location (src) at this point, and it starts to feel like a bad idea. Seems to me that putting a routing/redirecting endpoint would buy storefront a lot more flexibility yet not cost too much.

The main motivation to have it there is still the case by what we decided with Mikeal. To decrease need to do any IO operations other than reading the CAR. Anyway, we can discuss that maybe in a separate issue outside of the commp/commd context to avoid noise here?

@@ -353,25 +358,27 @@ type AggregateOffer struct {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be a good idea to reference types from the prior IPLD schema instead of trying to keeping two in sync

Copy link
Contributor Author

@vasco-santos vasco-santos Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the full schema below and reference on the upper part? of the other way around?

I think it makes more sense to me to have the full schema in the end, but I added its view above based on previous reviews for initial PR

@vasco-santos vasco-santos changed the title fix: align aggregation namings from commp and commd fix: align aggregation namings from commp learnings Jun 14, 2023
@vasco-santos vasco-santos force-pushed the fix/align-aggregation-namings-from-commp-and-commd branch 2 times, most recently from 9b4657f to 8072c05 Compare June 14, 2023 14:25
@vasco-santos
Copy link
Contributor Author

@Gozala did a new iteration here based on our sync yesterday. More specifically, moved away from segment name as it does not map with spade schema (and also by the call felt like segment naming should be avoided). Spade ends up calling piece to a CAR file in an aggregate, but also to the aggregate as a whole, which makes sense given by the end both are a bag of bytes (even though I would still prefer 2 names to be more clear on what is what).

Ended up with AggregateOfferDetail with offer (link to inline block with CARs that compose the aggregate) and a piece which has the whole aggregate CID (commP of commPs) and its size , or as a schema:

type Offer [ContentPiece]

type struct ContentPiece {
  piece PieceInfo
  # CAR Cid for convenience usage to get CAR details as needed (e.g. source URL)
  link Link
  src optional [URL]
}

# https://github.com/filecoin-project/go-state-types/blob/1e6cf0d47cdda75383ef036fc2725d1cf51dbde8/abi/piece.go#L47-L50
type PieceInfo {
  # Size in nodes. For BLS12-381 (capacity 254 bits), must be >= 16. (16 * 8 = 128)
  size Int
  link Link
}

type AggregateOffer struct {
  with StorefrontDID
  nb AggregateOfferDetail
}

type AggregateOfferDetail struct {
  # Contains each individual piece within Aggregate piece
  offer &Offer
  # Piece as Aggregate of CARs with padding
  piece PieceInfo
}

@Gozala would love your opinion on just using

Based on our discussion + review, looks like we:

  1. should consider also dropping src. Given timelines that we talked yesterday, I agree that we can potentially drop it.
  2. should we consider Offer CBOR block schema to just be an array of PieceInfo? By dropping src we currently only have CAR CID. I still see value on it for interactions with other services (like get presigned URL to read CAR), so that we do not need in-between map index all the time (including for debugging). So, in short my opinion would be to keep as in current PR, but happy to reconsider.
  3. what should we do with schema? just have it complete in the end?

@vasco-santos vasco-santos force-pushed the fix/align-aggregation-namings-from-commp-and-commd branch from 8072c05 to 060181a Compare June 14, 2023 14:33
@vasco-santos vasco-santos requested review from Gozala and alanshaw June 14, 2023 14:35
@vasco-santos vasco-santos force-pushed the fix/align-aggregation-namings-from-commp-and-commd branch from a89c3d6 to 903894e Compare June 15, 2023 13:30
@vasco-santos
Copy link
Contributor Author

@Gozala as we talked yesterday changed:

  • drop src
  • put schema in the end with references from sections accordingly

Now we just need to flush out the Car CID. Either w3s has responsability to expose an API that by individual commP provides URL (and we accept the extra layer of indirection for basically reporting failures of commPs by SPs, debugging, etc).

@vasco-santos
Copy link
Contributor Author

As talked with @Gozala out of band, we removed car CID from invocation

@vasco-santos vasco-santos self-assigned this Jun 21, 2023
@vasco-santos vasco-santos added the P0 Critical: Tackled by core team ASAP label Jun 21, 2023
Copy link
Collaborator

@Gozala Gozala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should land this (as it better reflects our current thinking) and iterate more in followups.


type Offer [OfferDetails]
```
A Storefront principal can invoke a capabilty to offer an aggregate that is ready to be included in Filecoin deal(s). See [schema](#aggregateoffer-schema).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give that filecoin spec refers to these actors as Aggregators it may be better idea to use the same term as them as opposed to Storefront, which I suggested before I was familiar with the preexisting terminology.

Comment on lines 131 to 137
"nb": {
"offer": { "/": "bafy...many-cars" }, /* dag-cbor CID */
"commitmentProof": { "/": "commitment...cars-proof" } /* commitment proof */
"offer": { "/": "bafy...many-cars" }, /* dag-cbor CID with offer content */
"piece": {
"link": { "/": "commitment...aggregate-proof" },
"size": 10102020203013342343
} /* commitment proof for aggregate */
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given all the fields that we've dropped I wonder if we should just flatten things up and have this more like

Suggested change
"nb": {
"offer": { "/": "bafy...many-cars" }, /* dag-cbor CID */
"commitmentProof": { "/": "commitment...cars-proof" } /* commitment proof */
"offer": { "/": "bafy...many-cars" }, /* dag-cbor CID with offer content */
"piece": {
"link": { "/": "commitment...aggregate-proof" },
"size": 10102020203013342343
} /* commitment proof for aggregate */
}
"nb": {
"offer": { "/": "bafy...offer" }, /* embedded with invocation */
}

Where bafy...offer is CBOR of structure like:

{
   "link": { "/": "offer...commp" }
   "size": 1010101
   "pieces": [
       {
           "link": { "/": "car...commp" },
           "size": 1010
       },
       // ....
   ]
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to voice this opinion, yet I wonder if we should omit the size which I believe could be determinstically derived from the pieces. Unless I'm mistaken existing libraries require to specify size (on aggregates specifically) because the remaining space could be filled with 0s. However it creates space for an error where you may have two different aggregates for exact same pieces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that this is not directly compatible with UCAN LOG. When we want to create metrics or to run consumers based on invocations/receipts of these invocations we would need to do extra ops to read block from CAR file. We could make the block as you suggest, which would probably be the exact content we send to Spade in the end, but I would still keep link and size in the nb field so that we can easily track things from the log without a layer of indirection

},
{
/* ... */
}
]
```

Each entry of the decoded offers block, has all the necessary information for a Storage Provider to fetch and store a CAR file. The `link` field has the CAR CID, while the `commitmentProof` field has the required `proof` bytes by Storage Providers (for example, `commP`). The `size` field MUST be set to the byte size of the CAR file. The `src` field of each piece MUST be set to a (alphabetically sorted) list of URLs from which it can be fetched. Note that `src` field is optional and can be provided in a different part of the flow such as when deal is signed or through a previously agreed API.
Each entry of the decoded offers block, has all the necessary information for a Storage Provider to fetch and store a CAR file. It includes an array of Filecoin `piece` info required by Storage Providers. Out of band, Storefront will provide to Storage Providers a `src` HTTP URL to each CAR file in the offer.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think "Out of band, ...." is somewhat misleading here. Perhaps document should say HTTP URLs will be issued during deal signing.

Aside: I have also been thinking that we may want to delegate UCAN that can be used for fetching aggregate pieces to the actor we sign with a deal with. That would prevent other actors from using that URL without sharing a private key or a UCAN delegation. It could also allow us to defer the pre-signed URL creation up until actor decides to fetch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wondering if aggreate/offer should be delegating capability to a "broker" to issue signed URLs for the aggregate pieces. That would allow broker to do it as needed and perhaps avoid back & forth. I realize we would still need to sign the deal with our key, but if delegated signatures were an option we could probably even get away from that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have started a thread on this line of thought here https://filecoinproject.slack.com/archives/C05C7CUPEKX/p1687366781115129

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the discussion in the linked thread. I think we should reframe "spade-proxy" as an "agency" to which "aggregators" can submit (offer) aggregates along with all the UCAN delegations that would allow it to:

  1. Create HTTP read URLs for the aggregate pieces.
  2. Sign deals with "storage providers" on behalf of the aggregator.

That would I think simplify things for both "aggregator" and "agency" (spade-proxy) reducing coordination between them as "agency" would be able to take the aggregate and either get it into a filecoin.

@vasco-santos I think this also could remove for mapping CAR CID <-> Piece CID, instead pieces could come with associated UCANs, allow agency to create HTTP URLs for them. Aggregator would still need to map CARs to pieces to surface corresponding filecoin state, which I think makes sense, it's like a reference to filecoin state.

w3-aggregation.md Outdated Show resolved Hide resolved
w3-aggregation.md Outdated Show resolved Hide resolved
w3-aggregation.md Outdated Show resolved Hide resolved
@Gozala
Copy link
Collaborator

Gozala commented Jun 21, 2023

I think maybe we should embrace new UCAN representation and also embrace upcoming invocation spec. With the above here is my attempt at the schema with all the things I've mentioned

# Agency namespaces aggregate APIs by DID of the aggregator
type AgencyAPI {[AggregatorDID] AggregateAPI }

type AggregateAPI union {
  AggregateOffer      "aggregate/offer"
  AggregateGet         "aggregate/get"
  DealArrange            "deal/arrange" 
} representation representation inline {
  discriminantKey "op"
}

type AggregateOffer struct {
   # in
   rsc        AggregatorDID
   input     Offer
   # out
   out        OfferState
   # kicks off "deal/arrange" and makes aggregate state "queued"
   join        &DealArrange
}

type OfferState union {
  Unit "ok"
  Any  "error"
} type keyed

type Offer struct {
   offer &AggregateInfo
   # delegation allowing to `publish/piece` contained pieces and `deal/sign` offered aggregate
   ucan &UCAN
}

type AggregateInfo struct {
   link        &CommP
   size        Int
   pieces   [PieceInfo]
}

type AggregateGet struct {
    # in
   rsc        AggregatorDID
   input     AggregateRef
   # out
   out        AggregateGetResult
}

type DealArrange struct {
   # in
   rsc        AgencyDID
   input     DealInfo
   # out
   out         DealResult
}

type DealInfo struct {
  aggregate   &CommP
}

type DealResult union {
   | Unit   "ok"
   | Any    "error"
} representation keyed

type AggregateGetResult union {
  AggregateState  "ok"
} keyed

type AggregateState union {
  | &QueuedAggregate     "queued"
  | &AcceptedAggregate   "accepted"
  | &RejectedAggregate    "rejected"
} representation keyed


type AggregatorAPI union {
  | PublishPiece            "piece/publish"
  | DealSign                   "deal/offer"
}

type PiecePublish {
   # in
   rsc      AggregatorDID
   input   ContentPieceInfo      
   # out
   out      ContentLocation
}

type ContentPieceInfo {
  piece         &CommP
  content     &CAR 
}


type ContentLocation {
  url URL
}

type DealOffer {
   rsc      AggregatorDID
   input   DealInfo
   # out
   out      DealResult
}

type Deal {
   aggregate      &CommP
   // .... not sure what else goes in here
}

type DealResult union {
  | DealSignature   "ok"
  | Any                     "error"
} representation keyed

type DealSignature {
   iss         AggregatorDID
   sig         bytes
}

type URL string
type CAR bytes
type CommP bytes


type # https://github.com/filecoin-project/go-state-types/blob/1e6cf0d47cdda75383ef036fc2725d1cf51dbde8/abi/piece.go#L47-L50
type PieceInfo {
  # Size in nodes. For BLS12-381 (capacity 254 bits), must be >= 16. (16 * 8 = 128)
  size Int
  link Link
}

@vasco-santos
Copy link
Contributor Author

vasco-santos commented Jun 22, 2023

@Gozala adopted the easies changes here and going to merge and iterate on the remaining discussions in issues / new PR

@vasco-santos vasco-santos merged commit 63846e5 into main Jun 22, 2023
@vasco-santos vasco-santos deleted the fix/align-aggregation-namings-from-commp-and-commd branch June 22, 2023 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 Critical: Tackled by core team ASAP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants