fix: align aggregation namings from commp learnings #62

vasco-santos · 2023-06-07T14:10:31Z

w3-aggregation.md

alanshaw · 2023-06-07T14:33:12Z

w3-aggregation.md

@@ -127,20 +127,23 @@ type AggregateOffer struct {

 type AggregateOfferDetail struct {
  offer OfferCBOR
-  commitmentProof Proof
+  segment Proof


What is the difference betrween segment and piece?

segment is commD (aggregate), in other words it is computed by a tree of commP (pieces)

Gozala

Provided bunch of feedback. I do however feel a gap in my understanding of frc-0058, which in turn might be leading to invalid suggestions here.

That said I think it is worth doing these changes now and iterating again once we gain better understanding.

Gozala · 2023-06-07T17:25:46Z

w3-aggregation.md

@@ -127,20 +127,23 @@ type AggregateOffer struct {

 type AggregateOfferDetail struct {
  offer OfferCBOR
-  commitmentProof Proof
+  segment Link


Unless I'm misunderstanding something, we also need a size, which can be derived from the pieces in the offer but seems like it would be a good idea to have it captured in the separate field.

I think it is a good idea to define a type alias for the Segment or SegmentLink with some comments on what is it a link for etc... I'm thinking something along the lines of

# Segment a set of Piece CIDs # @see https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0058.md type Segment [&Piece] # .... type SegmentLink = &Segment

Gozala · 2023-06-07T17:28:08Z

w3-aggregation.md

@@ -127,20 +127,23 @@ type AggregateOffer struct {

 type AggregateOfferDetail struct {
  offer OfferCBOR


Suggested change

offer OfferCBOR

offer &Offer

I think it's actually a link to a Offer block isn't it ?

Gozala · 2023-06-07T17:46:46Z

w3-aggregation.md

-    link Link
-    src [URL]
-    commitmentProof Proof
+type Piece {


I wonder if we should model this after PieceInfo where size is in number of leafs as opposed to bytes. It would be more aligned and use less space

We derive byte size from leaf size anyway https://github.com/web3-storage/data-segment/blob/509bb3d82aaf31ddc31a03acc6305f7c42310c47/src/commp.js#L74-L76

That sounds good yes! Good suggestion

Gozala · 2023-06-07T17:55:24Z

w3-aggregation.md

+
+type struct ContentPiece {
+  piece Piece
+  link Link


Not saying we should remove this, but I'm gettin an impression that Spade does not care about this. In a sense piece.link is the a link to the content archive but in another (CAR derived) format.

I think it would be good to add a code comment here clarifying what is this field used for, because I no longer can recall that detail.

this is the CAR CID. With pieceCid we do NOT need it, nor spade as far as I can tell. However, being able to have CAR cid available in the invocation can likely help us rather than needing to query every single pieceCid to get which CAR it comes from. Thoughts?

Would the same make sense for the commD part?

Gozala · 2023-06-07T18:11:17Z

w3-aggregation.md

+type struct ContentPiece {
+  piece Piece
+  link Link
+  src? [URL]


Did you mean optional field, if so below is how you'd spell it in IPLD schema syntax

Suggested change

src? [URL]

src optional [URL]

If you did intend to make it optional, I would highly encourage capturing rational in the code comment, specifically outlining what does omitting it entails.

I personally dislike double optionality as it creates opportunity for divergence. That is to say empty array/list already implies optionality as it can be empty.

🤔 I have also been wondering if we putting a URL to the content is a good idea at all. From my recollection, of the conversation we had at IPFS Thing, we may want to retain flexibility to generate short lived URLs when providers are trying to get this content and if so it may be better to leave it out.

I have also had been wondering if mapping between commP <-> carCID should be more of content claim, not to mention that we are already gearing up to map carCID <-> URL[] as content location claim

To be more concrete, I think it would really help to write down purpose of each field and how it's intended use that will help us evaluate it against content claims etc...

We have it described in the text for that invocation, more specifically

Note that src field is optional and can be provided in a different part of the flow such as when deal is signed or through a previously agreed API.

we decided to start by going this way to avoid the need to a new interaction for MVP. Also, considering we will have a redirect service for signed URLs, that will be usable here.

I think keeping the flexibility here of providing short lived URLs rather than relying on content location claims makes more sense. Otherwise, we will need to get SPs hooked up with our content location claims, which will take quite a while...

Gozala · 2023-06-07T18:18:53Z

w3-aggregation.md

@@ -179,7 +184,7 @@ Invoking `aggregate/offer` capability submits an aggregate to a broker service f
 ]
 ```

-Each entry of the decoded offers block, has all the necessary information for a Storage Provider to fetch and store a CAR file. The `link` field has the CAR CID, while the `commitmentProof` field has the required `proof` bytes by Storage Providers (for example, `commP`). The `size` field MUST be set to the byte size of the CAR file. The `src` field of each piece MUST be set to a (alphabetically sorted) list of URLs from which it can be fetched. Note that `src` field is optional and can be provided in a different part of the flow such as when deal is signed or through a previously agreed API.
+Each entry of the decoded offers block, has all the necessary information for a Storage Provider to fetch and store a CAR file. The `link` field has the CAR CID, while the `piece` field has the `proof` required by Storage Providers (for example, `commP`). The `src` field of each piece MUST be set to a (alphabetically sorted) list of URLs from which it can be fetched. Note that `src` field is optional and can be provided in a different part of the flow such as when deal is signed or through a previously agreed API.


This is no longer accurate as proof field is gone. I think whole thing should be reframed as mapping between content source (in CAR format) and a corresponding filecoin piece info.

I am less sure about source location (src) at this point, and it starts to feel like a bad idea. Seems to me that putting a routing/redirecting endpoint would buy storefront a lot more flexibility yet not cost too much.

I am less sure about source location (src) at this point, and it starts to feel like a bad idea. Seems to me that putting a routing/redirecting endpoint would buy storefront a lot more flexibility yet not cost too much.

The main motivation to have it there is still the case by what we decided with Mikeal. To decrease need to do any IO operations other than reading the CAR. Anyway, we can discuss that maybe in a separate issue outside of the commp/commd context to avoid noise here?

Gozala · 2023-06-07T18:21:27Z

w3-aggregation.md

@@ -353,25 +358,27 @@ type AggregateOffer struct {



I think it might be a good idea to reference types from the prior IPLD schema instead of trying to keeping two in sync

Should we keep the full schema below and reference on the upper part? of the other way around?

I think it makes more sense to me to have the full schema in the end, but I added its view above based on previous reviews for initial PR

vasco-santos · 2023-06-14T14:32:51Z

@Gozala did a new iteration here based on our sync yesterday. More specifically, moved away from segment name as it does not map with spade schema (and also by the call felt like segment naming should be avoided). Spade ends up calling piece to a CAR file in an aggregate, but also to the aggregate as a whole, which makes sense given by the end both are a bag of bytes (even though I would still prefer 2 names to be more clear on what is what).

Ended up with AggregateOfferDetail with offer (link to inline block with CARs that compose the aggregate) and a piece which has the whole aggregate CID (commP of commPs) and its size , or as a schema:

type Offer [ContentPiece]

type struct ContentPiece {
  piece PieceInfo
  # CAR Cid for convenience usage to get CAR details as needed (e.g. source URL)
  link Link
  src optional [URL]
}

# https://github.com/filecoin-project/go-state-types/blob/1e6cf0d47cdda75383ef036fc2725d1cf51dbde8/abi/piece.go#L47-L50
type PieceInfo {
  # Size in nodes. For BLS12-381 (capacity 254 bits), must be >= 16. (16 * 8 = 128)
  size Int
  link Link
}

type AggregateOffer struct {
  with StorefrontDID
  nb AggregateOfferDetail
}

type AggregateOfferDetail struct {
  # Contains each individual piece within Aggregate piece
  offer &Offer
  # Piece as Aggregate of CARs with padding
  piece PieceInfo
}

@Gozala would love your opinion on just using

Based on our discussion + review, looks like we:

should consider also dropping src. Given timelines that we talked yesterday, I agree that we can potentially drop it.
should we consider Offer CBOR block schema to just be an array of PieceInfo? By dropping src we currently only have CAR CID. I still see value on it for interactions with other services (like get presigned URL to read CAR), so that we do not need in-between map index all the time (including for debugging). So, in short my opinion would be to keep as in current PR, but happy to reconsider.
what should we do with schema? just have it complete in the end?

vasco-santos · 2023-06-15T13:34:38Z

@Gozala as we talked yesterday changed:

drop src
put schema in the end with references from sections accordingly

Now we just need to flush out the Car CID. Either w3s has responsability to expose an API that by individual commP provides URL (and we accept the extra layer of indirection for basically reporting failures of commPs by SPs, debugging, etc).

vasco-santos · 2023-06-20T16:04:25Z

As talked with @Gozala out of band, we removed car CID from invocation

Gozala

I think we should land this (as it better reflects our current thinking) and iterate more in followups.

Gozala · 2023-06-21T16:03:03Z

w3-aggregation.md

-
-type Offer [OfferDetails]
-```
+A Storefront principal can invoke a capabilty to offer an aggregate that is ready to be included in Filecoin deal(s). See [schema](#aggregateoffer-schema).


Give that filecoin spec refers to these actors as Aggregators it may be better idea to use the same term as them as opposed to Storefront, which I suggested before I was familiar with the preexisting terminology.

Gozala · 2023-06-21T16:09:54Z

w3-aggregation.md

    "nb": {
-      "offer": { "/": "bafy...many-cars" }, /* dag-cbor CID */
-      "commitmentProof": { "/": "commitment...cars-proof" } /* commitment proof */
+      "offer": { "/": "bafy...many-cars" }, /* dag-cbor CID with offer content */
+      "piece": {
+        "link": { "/": "commitment...aggregate-proof" },
+        "size": 10102020203013342343
+      } /* commitment proof for aggregate */
    }


Given all the fields that we've dropped I wonder if we should just flatten things up and have this more like

Suggested change

"nb": {

"offer": { "/": "bafy...many-cars" }, /* dag-cbor CID */

"commitmentProof": { "/": "commitment...cars-proof" } /* commitment proof */

"offer": { "/": "bafy...many-cars" }, /* dag-cbor CID with offer content */

"piece": {

"link": { "/": "commitment...aggregate-proof" },

"size": 10102020203013342343

} /* commitment proof for aggregate */

}

"nb": {

"offer": { "/": "bafy...offer" }, /* embedded with invocation */

}

Where bafy...offer is CBOR of structure like:

{ "link": { "/": "offer...commp" } "size": 1010101 "pieces": [ { "link": { "/": "car...commp" }, "size": 1010 }, // .... ] }

I'm hesitant to voice this opinion, yet I wonder if we should omit the size which I believe could be determinstically derived from the pieces. Unless I'm mistaken existing libraries require to specify size (on aggregates specifically) because the remaining space could be filled with 0s. However it creates space for an error where you may have two different aggregates for exact same pieces.

The problem here is that this is not directly compatible with UCAN LOG. When we want to create metrics or to run consumers based on invocations/receipts of these invocations we would need to do extra ops to read block from CAR file. We could make the block as you suggest, which would probably be the exact content we send to Spade in the end, but I would still keep link and size in the nb field so that we can easily track things from the log without a layer of indirection

Gozala · 2023-06-21T16:17:55Z

w3-aggregation.md

  },
  {
    /* ... */
  }
 ]
 ```

-Each entry of the decoded offers block, has all the necessary information for a Storage Provider to fetch and store a CAR file. The `link` field has the CAR CID, while the `commitmentProof` field has the required `proof` bytes by Storage Providers (for example, `commP`). The `size` field MUST be set to the byte size of the CAR file. The `src` field of each piece MUST be set to a (alphabetically sorted) list of URLs from which it can be fetched. Note that `src` field is optional and can be provided in a different part of the flow such as when deal is signed or through a previously agreed API.
+Each entry of the decoded offers block, has all the necessary information for a Storage Provider to fetch and store a CAR file. It includes an array of Filecoin `piece` info required by Storage Providers. Out of band, Storefront will provide to Storage Providers a `src` HTTP URL to each CAR file in the offer.


Nit: I think "Out of band, ...." is somewhat misleading here. Perhaps document should say HTTP URLs will be issued during deal signing.

Aside: I have also been thinking that we may want to delegate UCAN that can be used for fetching aggregate pieces to the actor we sign with a deal with. That would prevent other actors from using that URL without sharing a private key or a UCAN delegation. It could also allow us to defer the pre-signed URL creation up until actor decides to fetch.

I'm also wondering if aggreate/offer should be delegating capability to a "broker" to issue signed URLs for the aggregate pieces. That would allow broker to do it as needed and perhaps avoid back & forth. I realize we would still need to sign the deal with our key, but if delegated signatures were an option we could probably even get away from that.

I have started a thread on this line of thought here https://filecoinproject.slack.com/archives/C05C7CUPEKX/p1687366781115129

Given the discussion in the linked thread. I think we should reframe "spade-proxy" as an "agency" to which "aggregators" can submit (offer) aggregates along with all the UCAN delegations that would allow it to:

Create HTTP read URLs for the aggregate pieces.

Sign deals with "storage providers" on behalf of the aggregator.

That would I think simplify things for both "aggregator" and "agency" (spade-proxy) reducing coordination between them as "agency" would be able to take the aggregate and either get it into a filecoin.

@vasco-santos I think this also could remove for mapping CAR CID <-> Piece CID, instead pieces could come with associated UCANs, allow agency to create HTTP URLs for them. Aggregator would still need to map CARs to pieces to surface corresponding filecoin state, which I think makes sense, it's like a reference to filecoin state.

w3-aggregation.md

Gozala · 2023-06-21T19:27:31Z

I think maybe we should embrace new UCAN representation and also embrace upcoming invocation spec. With the above here is my attempt at the schema with all the things I've mentioned

# Agency namespaces aggregate APIs by DID of the aggregator
type AgencyAPI {[AggregatorDID] AggregateAPI }

type AggregateAPI union {
  AggregateOffer      "aggregate/offer"
  AggregateGet         "aggregate/get"
  DealArrange            "deal/arrange" 
} representation representation inline {
  discriminantKey "op"
}

type AggregateOffer struct {
   # in
   rsc        AggregatorDID
   input     Offer
   # out
   out        OfferState
   # kicks off "deal/arrange" and makes aggregate state "queued"
   join        &DealArrange
}

type OfferState union {
  Unit "ok"
  Any  "error"
} type keyed

type Offer struct {
   offer &AggregateInfo
   # delegation allowing to `publish/piece` contained pieces and `deal/sign` offered aggregate
   ucan &UCAN
}

type AggregateInfo struct {
   link        &CommP
   size        Int
   pieces   [PieceInfo]
}

type AggregateGet struct {
    # in
   rsc        AggregatorDID
   input     AggregateRef
   # out
   out        AggregateGetResult
}

type DealArrange struct {
   # in
   rsc        AgencyDID
   input     DealInfo
   # out
   out         DealResult
}

type DealInfo struct {
  aggregate   &CommP
}

type DealResult union {
   | Unit   "ok"
   | Any    "error"
} representation keyed

type AggregateGetResult union {
  AggregateState  "ok"
} keyed

type AggregateState union {
  | &QueuedAggregate     "queued"
  | &AcceptedAggregate   "accepted"
  | &RejectedAggregate    "rejected"
} representation keyed


type AggregatorAPI union {
  | PublishPiece            "piece/publish"
  | DealSign                   "deal/offer"
}

type PiecePublish {
   # in
   rsc      AggregatorDID
   input   ContentPieceInfo      
   # out
   out      ContentLocation
}

type ContentPieceInfo {
  piece         &CommP
  content     &CAR 
}


type ContentLocation {
  url URL
}

type DealOffer {
   rsc      AggregatorDID
   input   DealInfo
   # out
   out      DealResult
}

type Deal {
   aggregate      &CommP
   // .... not sure what else goes in here
}

type DealResult union {
  | DealSignature   "ok"
  | Any                     "error"
} representation keyed

type DealSignature {
   iss         AggregatorDID
   sig         bytes
}

type URL string
type CAR bytes
type CommP bytes


type # https://github.com/filecoin-project/go-state-types/blob/1e6cf0d47cdda75383ef036fc2725d1cf51dbde8/abi/piece.go#L47-L50
type PieceInfo {
  # Size in nodes. For BLS12-381 (capacity 254 bits), must be >= 16. (16 * 8 = 128)
  size Int
  link Link
}

Co-authored-by: Irakli Gozalishvili <[email protected]>

vasco-santos · 2023-06-22T08:25:48Z

@Gozala adopted the easies changes here and going to merge and iterate on the remaining discussions in issues / new PR

vasco-santos requested review from Gozala and alanshaw June 7, 2023 14:11

vasco-santos mentioned this pull request Jun 7, 2023

feat: w3 aggregate protocol client and api implementation storacha/w3up#787

Merged

1 task

alanshaw requested changes Jun 7, 2023

View reviewed changes

Gozala reviewed Jun 7, 2023

View reviewed changes

vasco-santos changed the title ~~fix: align aggregation namings from commp and commd~~ fix: align aggregation namings from commp learnings Jun 14, 2023

vasco-santos added 3 commits June 14, 2023 16:04

fix: align aggregation namings from commp and commd

97df283

fix: swap proof with link

dcef229

chore: address first review comments

99cbfa4

vasco-santos force-pushed the fix/align-aggregation-namings-from-commp-and-commd branch 2 times, most recently from 9b4657f to 8072c05 Compare June 14, 2023 14:25

chore: iterate by moving away from segment naming

060181a

vasco-santos force-pushed the fix/align-aggregation-namings-from-commp-and-commd branch from 8072c05 to 060181a Compare June 14, 2023 14:33

vasco-santos requested review from Gozala and alanshaw June 14, 2023 14:35

vasco-santos added 2 commits June 15, 2023 15:21

chore: drop src and use single schema definition

e8e8c14

fix: lint issue

903894e

vasco-santos force-pushed the fix/align-aggregation-namings-from-commp-and-commd branch from a89c3d6 to 903894e Compare June 15, 2023 13:30

chore: drop car cid from invocation block

df468f1

This was referenced Jun 21, 2023

aggregate capabilities update on spec namings alignment storacha/w3up#823

Closed

w3filecoin MVP storacha/w3filecoin-infra#19

Open

vasco-santos self-assigned this Jun 21, 2023

vasco-santos added the P0 Critical: Tackled by core team ASAP label Jun 21, 2023

Gozala approved these changes Jun 21, 2023

View reviewed changes

vasco-santos and others added 2 commits June 22, 2023 10:13

chore: apply suggestions from code review

9bc01c0

Co-authored-by: Irakli Gozalishvili <[email protected]>

fix: remove unused type

c54e73d

vasco-santos merged commit 63846e5 into main Jun 22, 2023

vasco-santos deleted the fix/align-aggregation-namings-from-commp-and-commd branch June 22, 2023 08:26

This was referenced Jun 22, 2023

Embrace new UCAN representation and invocation Spec #65

Open

fix!: update aggregate spec in client and api storacha/w3up#824

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: align aggregation namings from commp learnings #62

fix: align aggregation namings from commp learnings #62

vasco-santos commented Jun 7, 2023 •

edited

Loading

alanshaw Jun 7, 2023

vasco-santos Jun 7, 2023

Gozala left a comment

Gozala Jun 7, 2023

Gozala Jun 7, 2023

Gozala Jun 7, 2023

Gozala Jun 7, 2023

vasco-santos Jun 9, 2023

Gozala Jun 7, 2023

vasco-santos Jun 9, 2023

vasco-santos Jun 9, 2023

Gozala Jun 7, 2023

vasco-santos Jun 8, 2023 •

edited

Loading

Gozala Jun 7, 2023

vasco-santos Jun 9, 2023

Gozala Jun 7, 2023

vasco-santos Jun 9, 2023 •

edited

Loading

vasco-santos commented Jun 14, 2023

vasco-santos commented Jun 15, 2023

vasco-santos commented Jun 20, 2023

Gozala left a comment

Gozala Jun 21, 2023

Gozala Jun 21, 2023

Gozala Jun 21, 2023

vasco-santos Jun 22, 2023

Gozala Jun 21, 2023

Gozala Jun 21, 2023

Gozala Jun 21, 2023

Gozala Jun 21, 2023

Gozala commented Jun 21, 2023

vasco-santos commented Jun 22, 2023 •

edited

Loading

		@@ -127,20 +127,23 @@ type AggregateOffer struct {

		type AggregateOfferDetail struct {
		offer OfferCBOR

fix: align aggregation namings from commp learnings #62

fix: align aggregation namings from commp learnings #62

Conversation

vasco-santos commented Jun 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gozala left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vasco-santos Jun 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vasco-santos Jun 9, 2023 • edited Loading

Choose a reason for hiding this comment

vasco-santos commented Jun 14, 2023

vasco-santos commented Jun 15, 2023

vasco-santos commented Jun 20, 2023

Gozala left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gozala commented Jun 21, 2023

vasco-santos commented Jun 22, 2023 • edited Loading

vasco-santos commented Jun 7, 2023 •

edited

Loading

vasco-santos Jun 8, 2023 •

edited

Loading

vasco-santos Jun 9, 2023 •

edited

Loading

vasco-santos commented Jun 22, 2023 •

edited

Loading