-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Add support for multi get operation for database queries #2396
base: master
Are you sure you want to change the base?
perf: Add support for multi get operation for database queries #2396
Conversation
f9e0289
to
2532b06
Compare
97f786f
to
5405e6c
Compare
2920ae8
to
cff7b47
Compare
cff7b47
to
b23e4be
Compare
Proposals to the #2396
crates/storage/src/kv_store.rs
Outdated
@@ -41,7 +44,7 @@ pub trait StorageColumn: Copy + core::fmt::Debug { | |||
|
|||
/// The definition of the key-value inspection store. | |||
#[impl_tools::autoimpl(for<T: trait> &T, &mut T, Box<T>)] | |||
pub trait KeyValueInspect { | |||
pub trait KeyValueInspect: Send + Sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is very sad that we need to require Send + Sync
only because BoxedIter
requires Send
. Maybe we can define non Send boxed iterator instead and use it(because it seems we don't need Send
feature, but maybe I'm wrong)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this bound is just to satisfy the BoxedIter
requirement. I can see if I can define a non-send one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hitting some walls with this implementation. Will try again tomorrow with fresh eyes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright my mistake was trying to change some iterators that we turn into streams, and they need to be Send
. Now I have managed to get rid of these trait bounds cbb6efc.
Now we have two boxed iterators: BoxedIter
which doesn't require Send, and BoxedIterSend
for the cases when you need Send.
<Self as StorageBatchInspect<OldTransactions>>::get_batch(self, ids) | ||
.map(|result| result.and_then(|opt| opt.ok_or(not_found!(OldTransactions)))) | ||
.into_boxed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, you can add the same syntax sugars that we did with storage_as_ref
to avoid <Self as StorageInspect<M>>
usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting, that would be nice. I'll look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out this is not as trivial as I had thought, as we're hitting some lifetime issues.
The problem
get_batch
returns an iterator bound to the lifetime of the self
parameter. I.e. the storage we call it on. This is necessary since we do self.get(...)
within the KeyValueInspect::get_batch
implementation.
So while we can implement StorageBatchInspect
for the StorageRef
type as:
impl<'a, S, Type> StorageBatchInspect<Type> for StorageRef<'a, S, Type>
where
S: StorageBatchInspect<Type>,
Type: Mappable,
{
#[inline(always)]
fn get_batch<'b, Iter>(
&'b self,
_keys: Iter,
) -> impl Iterator<Item = Result<Option<Type::OwnedValue>>> + 'b
where
Iter: 'b + IntoIterator<Item = &'b Type::Key>,
Type::Key: 'b,
{
None.into_iter() // Note that we'd need access to `self.0` here which requires VM changes, or copying the `StorageRef` type into this crate. But that's a separate, and very manageable issue
}
}
when we try to use it as
fn old_transactions<'a>(
&'a self,
ids: BoxedIter<'a, &'a TxId>,
) -> BoxedIter<'a, StorageResult<Transaction>> {
self.storage::<OldTransactions>()
.get_batch(ids)
.map(|result| result.and_then(|opt| opt.ok_or(not_found!(OldTransactions))))
.into_boxed()
}
we're hitting this issue
error[E0716]: temporary value dropped while borrowed
--> crates/fuel-core/src/service/adapters/graphql_api/off_chain.rs:183:9
|
179 | fn old_transactions<'a>(
| -- lifetime `'a` defined here
...
183 | self.storage::<OldTransactions>()
| -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| _________creates a temporary value which is freed while still in use
| |
184 | | .get_batch(ids)
| |___________________________- argument requires that borrow lasts for `'a`
...
187 | }
| - temporary value is freed at the end of this statement
Going forward
For this PR I suggest we don't go down this rabbit hole, but I'd be happy to do it as a follow-up if you think it's possible/interesting to explore it. Seems like a quite low prio though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to get some benchmarks(I guess you need to add new one that will use GraphQL) =)
Since this PR add `into_bytes` for the encoder, we can optimize the batch mutate operations as well #2396 (review)
Sure thing. I'm not sure if it makes sense to use criterion/cargo bench for end-to-end performance testing though as it is more geared towards micro-benchmarking, and we need to do something like the following.
I'll look into our options for this type of workload. Let me know if you have any opinions or thoughts on this. |
32993e0
to
3a3d967
Compare
3a3d967
to
cbb6efc
Compare
I'll aim at defining these workloads as a stand-alone binary, which would allow us to use hyperfine to actually execute the benchmarks and interpret the results. |
@@ -177,7 +177,7 @@ impl LowerHex for TxPointer { | |||
} | |||
} | |||
|
|||
#[derive(cynic::Scalar, Debug, Clone)] | |||
#[derive(cynic::Scalar, Debug, Clone, PartialEq, Eq)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes needed? I can't seem to find a place where these additional derives are necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it doesn't seem like these are needed anymore. I probably added them while iterating on the code, but I can cut them out. Though since this is a public type, I'd argue that we still should proactively add these common traits as per https://rust-lang.github.io/api-guidelines/interoperability.html#types-eagerly-implement-common-traits-c-common-traits
I can still cut them out if you want me to. Let me know what you think.
@@ -194,7 +194,7 @@ impl Deref for HexString { | |||
} | |||
} | |||
|
|||
#[derive(Debug, Clone)] | |||
#[derive(Debug, Clone, PartialEq, Eq)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reply :)
crates/storage/src/codec.rs
Outdated
} | ||
|
||
/// The trait encodes the type to the bytes and passes it to the `Encoder`, | ||
/// which stores it and provides a reference to it. That allows gives more | ||
/// flexibility and more performant encoding, allowing the use of slices and arrays | ||
/// instead of vectors in some cases. Since the [`Encoder`] returns `Cow<[u8]>`, | ||
/// it is always possible to take ownership of the serialized value. | ||
pub trait Encode<T: ?Sized> { | ||
pub trait Encode<T: ?Sized>: 'static { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure why we need the 'static
bound here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right no this is just a leftover from a previous attempt to fix the lifetime issues with the Encode
trait, but in the current solution this is redundant. Thanks for noticing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in c3b7264
Co-authored-by: Rafał Chabowski <[email protected]>
storage: &'a S, | ||
keys: Iter, | ||
column: S::Column, | ||
) -> impl Iterator<Item = crate::Result<Option<M::OwnedValue>>> + 'a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
) -> impl Iterator<Item = crate::Result<Option<M::OwnedValue>>> + 'a | |
) -> impl IntoIterator<Item = crate::Result<Option<M::OwnedValue>>> + 'a |
Iterator
s implement IntoIterator
, so if you change the signature to return a IntoIterator
then StorageBatchInspect
becomes slightly more flexible, as implementations can return either an iterator or a collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general principle, I like interfaces to be very general in what they accept but prefer them to be specific in what they return. (see Robusness principle). That's why I typically accept IntoIterator
in parameters, because as you say then the caller doesn't have to care about doing `.into_iter()´ explicitly if they have a collection or anything.
Therefore, for return values it's better in my opinion to return the Iterator
directly since it is more specific. Otherwise, all callers are basically forced to call .into_iter()
before they can do anything with the returned iterator, which is less ergonomic. Any implementer of the trait can (and should) always call .into_iter()
if they are about to return a collection. Again, not doing so would just force this burden upon the user of the function which is less than ideal without providing any benefit whatsoever.
Let me know if you disagree and I'll be happy to further discuss this.
let transactions: Vec<_> = self | ||
.on_chain | ||
.transactions(tx_ids.iter().into_boxed()) | ||
.collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you can avoid collecting here since later you iterate on transactions again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unforutnately I need to both iterate over the transactions and zip them with the on_chain_results - so we need to iterate twice over the tersults. We could potentially do something fancy with Itertools::tee, but I think this solution is simpler and easier on the eyes.
36386af
to
c3b7264
Compare
Got some initial benchmark results in #2433, and it doesn't look good for the multi-get implementation. So far it's either slower or the same for all workloads I've tried. I've done the following:
I'm all ears if anyone has any suggestions for workloads to try, but until we can find evidence that this change actually is an improvement in any way we should put this PR on hold. |
Linked Issues
Closes #2344
Description
This PR adds support for getting batches of values from the storage through a new
StorageBatchInspect
trait infuel-core-storage
.This trait is implemented for
StructuredStorage
andGenericDatabase
through a newget_batch
method added to theKeyValueInspect
andBluePrintInspect
traits. Theget_batch
method is implemented with themulti-get
operation for the RocksDB based storage implementations, and uses a default blanket implementation for in-memory implementations.Checklist
Before requesting review