-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: blockstore: GetMany blockstore method #492
base: main
Are you sure you want to change the base?
Conversation
…pstream master branch of go-datastore
Thank you for submitting this PR!
Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for future me:
We maybe want automatic forwarding using type assertions in the default constructor function too.
blockstore/blockstore.go
Outdated
bs.rehash.Store(enabled) | ||
} | ||
|
||
func (bs *getManyBlockStore) Get(ctx context.Context, k cid.Cid) (blocks.Block, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised this function needs to be duplicated, I think you could use embeding here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possible but would require passing in a ds.Batching
to the constructor function in addition to theds.TxnDatastore
(or some composite interface of the two) as ds.TxnDatastore
can't be used to construct a regular Blockstore
for embedding because it doesn't satisfy the ds.Batching
interface, even though the Put
and Get
methods we would like to fall through to don't use the portions of that interface that it doesn't support...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to break the normal Blockstore
up into the components that need ds.Batching
and the ones that just need ds.Datastore
.ds.TxDatastore
satisfies ds.Datastore
and we only need ds.Datastore
to fulfill the Get
and Put
methods on either blockstore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know what you think the best/cleanest approach is here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make it take the union of batching and txn datastores so it has access to all that is needed.
You could also implement a stub, a batch is inferior to a transaction while providing the same features:
Batching datastores support deferred, grouped updates to the database.
Batch
es do NOT have transactional semantics: updates to the underlying datastore are not guaranteed to occur in the same iota of time. Similarly, batched updates will not be flushed to the underlying datastore untilCommit
has been called.Txn
s from aTxnDatastore
have all the capabilities of aBatch
, but the reverse is NOT true.
https://pkg.go.dev/github.com/ipfs/go-datastore#Batching
So if you accept a TxnDatastore you can use a simple stub that implements batches using transactions, this would make thee code work in the only place it's used right now.
blockstore/blockstore.go
Outdated
@@ -64,6 +65,13 @@ type Blockstore interface { | |||
HashOnRead(enabled bool) | |||
} | |||
|
|||
// TxnBlockstore is a blockstore interface that supports GetMany and PutMany methods using ds.TxnDatastore | |||
type TxnBlockstore interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think TxnBlockstore
is wrong because the blockstore does not provide transactions, GetManyBlockstore
was fine mb.
Codecov ReportAttention: Patch coverage is
@@ Coverage Diff @@
## main #492 +/- ##
==========================================
- Coverage 65.78% 65.64% -0.15%
==========================================
Files 207 203 -4
Lines 25156 25385 +229
==========================================
+ Hits 16549 16663 +114
- Misses 7147 7235 +88
- Partials 1460 1487 +27
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx I'll try to look at this before the freeze on friday but I can't promise anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @i-norden 🙏. IIUC this is related to your work landing a usable alternative to filecoin-project/go-hamt-ipld#103 which is great to see.
Left some suggestions.
type GetManyBlockstore interface { | ||
Blockstore | ||
PutMany(ctx context.Context, blocks []blocks.Block) error | ||
GetMany(context.Context, []cid.Cid) ([]blocks.Block, []cid.Cid, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two recommendations:
- It seems like there's no need for
[]blocks.Block
and[]cid.Cid
sinceBlock
contains a.Cid()
method https://github.com/ipfs/go-block-format/blob/v0.2.0/blocks.go#L19-L25 - You may want to consider a streaming interface so that you don't have to buffer all the blocks in memory
- The simplest option is to return either
(<-chan blocks.Block, error)
or(<-chan BlockOption
) depending on how you want errors returned - More complicated, but more featureful, would be if this had a more "session-like" quality to it where you could add/remove requests from the channel so that you don't have to spin up one goroutine per-block get (or per batch of gets). Could be something like this https://github.com/ipfs/go-bitswap/pull/593/files#diff-b751089c21e2437423d20fbee6c9a01a61c2422d5f7630ab8272d7fdeb63b654R17
- The simplest option is to return either
If returning an asynchronous object (e.g. channel or iterator) might be worth taking a look at ipfs/kubo#4592 to make sure you don't run into some common pitfalls. With Go generics now iterators may also make this easier than it used to be.
if len(cs) == 1 { | ||
// performance fast-path | ||
block, err := bs.Get(ctx, cs[0]) | ||
return []blocks.Block{block}, nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't make sense to not return the CID here given it's in the signature, but also it doesn't seem like []cid.Cid
needs to be in the return signature
This PR introduces a
GetMany
blockstore method to compliment the existingPutMany
method. This is for use in aGetMany
go-ipld-cbor datastore implementation which, in turn, is for use in parallelForEach
methods for both go-amt-ipld and go-hamt-ipld. This PR uses theTxnDatastore
keytransform/namespace wrapper introduced in ipfs/go-datastore#210.GetMany
go-ipld-cbor PR that uses this: ipfs/go-ipld-cbor#97TODO:
Replace replace directives if/when the dependency is merged and released