-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to retrieve when root payload CID is an identity CID #715
Conversation
Even |
Status of investigation so far:
|
btw, the |
I'm not super familiar with the dag store and deals DB, but I'm wondering if we can do a fallback lookup by payload cid if the multihash lookup fails as a workaround. |
There are valid cases where a CAR may have an identity CID as its root that is not represented as a 'block' within the CAR body and therefore isn't indexed by the dagstore. In this case, we inspect the identity CID content and treat the query as a query for the intersection of all of the links within the block. Ref: filecoin-project/boost#715
There are valid cases where a CAR may have an identity CID as its root that is not represented as a 'block' within the CAR body and therefore isn't indexed by the dagstore. In this case, we inspect the identity CID content and treat the query as a query for the intersection of all of the links within the block. Ref: filecoin-project/boost#715
Another alternative approach to consider, suggested by @dirkmc: when the lookup fails, decode the identity CID and query for all of the links it contains and return the result for that. Implemented in filecoin-project/go-fil-markets#747 and working if this branch links against that branch of go-fil-markets. Things to consider about this approach:
|
One more piece of learning from my investigations today: online deals handle identity CIDs differently to the default CAR creation process that we've been stuck with elsewhere:
I'm not sure but I believe that prior to the go-car/v2 blockstore interface, we would have been writing these out with identity CIDs in them because that's what go-car always did. I discovered this by extending lotus/itest/deals_anycid_test.go and inserting some identity CIDs into the source data. It transfers OK and I get a CAR out the SP end and it has an identity CID that I put in the root but it doesn't have the intermediate ones as blocks in the CAR so fails the CommP comparison. Depending on how users are generating CARs, this may or may not be a problem in the wild but it could explain some CommP mismatch errors. Lotus itself sets up an inline CID builder for UnixFS creation (which I believe is used for |
This sounds like filecoin-project/lotus#8663 |
Summary of options
A combination of these may be appropriate. Other ideas? |
@rvagg forgive any ignorance here, but IIUC there's a fourth option that looks a little bit like your third one "looking for shards that contain all of the CIDs within the identity CID", but is a little less hard-coded to identity CIDs.
|
@aschmahmann yeah, this is the bit that slightly confuses me and I hope @hannahhoward can provide additional context - I think that the main source of complexity is that retrieval deals are tied to pieces, which may or may not be sealed. So when setting up a retrieval deal (the "query"), you're giving a payload CID and it's using that to figure out where it's going to get it from and sends you back the deal parameters for it. So these difficulties all hinge from that point, where we're tying a Payload CID to a Piece CID to set up a retrieval deal that's acceptable to both parties. Maybe we can make this go away, or mostly go away with query-ask v2? #671 |
I was thinking about this, and it's possible we could solve this this as part of moving graphsync retrievals out of the main process once we have a load balancer. Then GraphSync could use the same remote blockstore used by bitswap and employ the same checks -- i.e. don't ask for CIDs that are identity CIDs across the process boundary. |
FYI I'm working on tidying up filecoin-project/go-fil-markets#747 to propose as a workaround in lieu of a clearer path forward. Possibly we just need to patch this over to make current style deals work and then move on to focusing on less piece-centric retrievals with bitswap and independent graphsync? |
There are valid cases where a CAR may have an identity CID as its root that is not represented as a 'block' within the CAR body and therefore isn't indexed by the dagstore. In this case, we inspect the identity CID content and treat the query as a query for the intersection of all of the links within the block. Ref: filecoin-project/boost#715
There are valid cases where a CAR may have an identity CID as its root that is not represented as a 'block' within the CAR body and therefore isn't indexed by the dagstore. In this case, we inspect the identity CID content and treat the query as a query for the intersection of all of the links within the block. Ref: filecoin-project/boost#715
17b9b4f
to
648fd75
Compare
This is passing now with the only "fix" applied being to use go-fil-markets @ filecoin-project/go-fil-markets#747 so you can now do retrievals with a root identity CID and it'll work it out for you on the SP end. |
I think we should merge this in to get the test changes included, although it might need to wait till filecoin-project/go-fil-markets#747 is merged |
There are valid cases where a CAR may have an identity CID as its root that is not represented as a 'block' within the CAR body and therefore isn't indexed by the dagstore. In this case, we inspect the identity CID content and treat the query as a query for the intersection of all of the links within the block. Ref: filecoin-project/boost#715
Let's merge this into the release/lotus1.17.2 branch. That's the target release for the breaking libp2p updates, so the plan is to include all our updates dependent on that release there. |
👍 filecoin-project/go-fil-markets#747 needs someone with merge access though. It's got a 👍 although there were a few changes based on our conversations since that 👍 but it should be fine to merge and update here. Note that that's going into go-fil-markets master which has those libp2p upgrades in it too so that'll need to come back here. |
) * feat: handle retrieval queries for unindexed identity payload CIDs There are valid cases where a CAR may have an identity CID as its root that is not represented as a 'block' within the CAR body and therefore isn't indexed by the dagstore. In this case, we inspect the identity CID content and treat the query as a query for the intersection of all of the links within the block. Ref: filecoin-project/boost#715 * fix: refactor out multiple calls to dagStore.GetPiecesContainingBlock 1. to support identity PayloadCID without having to duplicate decode & lookup logic 2. because it's not cheap, especially for identity PayloadCIDs with lots of links The tradeoff is that in some cases we end up calling the PieceStore more than we otherwise would. * feat: impose limits on identity PayloadCIDs * Byte limit (2048) * Link limit (32) * feat: handle retrievals for nested identity CIDs * chore: expand testing to cover dag-pb identity CIDs
filecoin-project/go-fil-markets#747 is merged, but it still needs to bubble up through the stack, which might take a while given the lotus dependency issues |
i.e. don't copypasta this for your filecoin deal prep
9fcb9be
to
0412500
Compare
I ran across this checking the retrieval backlog so I went ahead and rebased with minor updates since the dependency chain has been resolved now. Tests are all passing so this should be good for a final review. |
// Disable this part of the test for now because there is a bug in lotus | ||
// introduced in this PR that causes the test to fail: | ||
// https://github.com/filecoin-project/lotus/pull/9174/files#top | ||
// ********************************************************************* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nonsense This was previously commented out in a separate update. I added it back in to see if things are passing now which seems to be the case.
What
Any CAR file with an identity CID as the root payload CID is unretrievable if written with CarV2.
When a file is imported via UnixFS, the default builders have a condition where any block less than 126 bytes becomes an identity CID. This is true for the most common CAR generation utils used by Lotus & boostx. So this is likely applicable to real world pieces.
Car files written with CarV1 with identity CIDs may or may not be retrievable but as they are written back out with CarV2 writers they likely won't match commP if trying to do whole payload / piece retrieval over GraphSync.
How
This PR is actually an issue with a test to demonstrate. This issue affacts the default integration tests, and you can demonstrate simply by adding a retrieval to the boost deal test. This produces an error of:
(probably will produce a different CID if you run it yourself)
You can then fix the test (or at least get it past the above issue) by removing the InlineBuilder in the test car file generator: https://github.com/filecoin-project/boost/blob/main/testutil/car.go#L142
Looking at the error, I believe the issue is that the identity CID is simply not present in the car index generated by the DAGStore, as it's not in the underlying CARv2. So when you ask "what information do you have on this CID?", it responds saying "I don't have any"