-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
websocket: send/receive reducer & table ids instead of names #1883
base: master
Are you sure you want to change the base?
Conversation
c659ca4
to
7e91c80
Compare
13053e0
to
9ed5071
Compare
18f4e87
to
b6689e9
Compare
a9e63d5
to
fd40008
Compare
b6689e9
to
33e815c
Compare
fd40008
to
73488dc
Compare
3c60243
to
10c465f
Compare
eafc64b
to
9c85842
Compare
10c465f
to
e6c7062
Compare
9c85842
to
01d3cd9
Compare
b224fd6
to
6cc4313
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please update the PR description to match the template.
- I would like to see a Rust SDK test added which verifies that it's possible to construct a connection and then immediately invoke a reducer, without first waiting for the on-connect callback. I would expect this to queue the reducer call until the handshake message is received. The code appears to do this correctly, but I want a test to make sure we don't regress it in the future.
fn generate_table(&self, module: &ModuleDef, namespace: &str, tbl: &TableDef) -> String; | ||
fn generate_table(&self, idx: u32, module: &ModuleDef, namespace: &str, tbl: &TableDef) -> String; | ||
fn generate_type(&self, module: &ModuleDef, namespace: &str, typ: &TypeDef) -> String; | ||
fn generate_reducer(&self, module: &ModuleDef, namespace: &str, reducer: &ReducerDef) -> String; | ||
fn generate_reducer(&self, idx: u32, module: &ModuleDef, namespace: &str, reducer: &ReducerDef) -> String; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change to add indices to the client codegen? I believe, but would like it confirmed, that these are not table IDs which could be sent over a WebSocket; rather, they appear to be an unrelated optimization to the SDK internals.
If that is the case, I would strongly prefer this be broken out into a separate PR, and frankly I would see very little reason to approve such a PR without measurements showing that HashMap
lookups of table and reducer name strings are a significant performance overhead in the SDKs. I would also question why we'+ prefer to introduce a new ID to this interface, rather than a change purely internal to the codegen, possibly involving either generating these indices within the codegen, or possibly involving a perfect hash function.
If that is not the case, and this change is necessary or useful towards the actual goal of this PR, namely transmitting integer IDs rather than name strings over the WebSocket API, then I would like to see documentation on this trait's methods describing what these indices mean, where they come from and how they're used. I would also like the PR description amended to describe this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change to add indices to the client codegen? I believe, but would like it confirmed, that these are not table IDs which could be sent over a WebSocket; rather, they appear to be an unrelated optimization to the SDK internals.
Indeed, these are indices into the 2 arrays, one for reducer names and one for reducer ids and the same for table names <-> table ids. This seemed like the most efficient way to represent things, avoiding hash maps in many cases in favor of just Vec<_>
s.
If that is the case, I would strongly prefer this be broken out into a separate PR, [...]
Sure, I can do that, but then it will likely miss the train...
[...] and frankly I would see very little reason to approve such a PR without measurements showing that
HashMap
lookups of table and reducer name strings are a significant performance overhead in the SDKs. I would also question why we'+ prefer to introduce a new ID to this interface, rather than a change purely internal to the codegen, possibly involving either generating these indices within the codegen, or possibly involving a perfect hash function.
Why it would need to be a significant perf overhead. It seems clear to me that it is a perf win and more predictable to identity hash an u32
and using it as an index into a vector, rather than hashing strings. I don't understand what you mean regarding a change internal to the codegen. Whatever we do, we have to keep mappings K -> ID
and sometimes ID -> K
in the incoming-message-loop and DbContextImpl
, as the values of ID
are determined at handshake. K
could then be &'static str
or an index into the list of reducers/tables. This PR opted for the latter for efficiency reasons.
If that is not the case, and this change is necessary or useful towards the actual goal of this PR, namely transmitting integer IDs rather than name strings over the WebSocket API, then I would like to see documentation on this trait's methods describing what these indices mean, where they come from and how they're used. I would also like the PR description amended to describe this change.
Sounds good, I will definitely add that documentation/amend the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not dispute that this is obviously better for runtime performance, but there are non-technical costs to making changes like this. For example:
- I have a stack of outstanding PRs, ending in an actual P1 ticket with actual performance and usability implications for the SDK, which will significantly conflict with this change.
- Reviewing this PR was made artificially and unnecessarily more difficult because it included this change, and I had to sort through which parts of the diff related to the stated goal of the PR versus which parts related to this change.
- Introducing a third identifier for database objects, in addition to the string names and the runtime IDs, adds significant potential for confusion. It's already not great that tables have both names and IDs, with different semantics, but at least the names are of an obviously distinct type from the IDs, so there's less risk of confusing the two. I believe there to be a very real cost in code complexity and maintenance to adding another integer identifier for database objects which is not interchangeable with IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is the additional cost that making a larger change may introduce more bugs. As the person who has to deal with all the bugs and their downstream ramifications, I can tell you that the cost is very very high. We just dealt with it most recently with the change to u256
for Identity
. The nature of these things is that it always feels like it will probably be fine, but it rarely is and then we pay the price downstream.
I agree with @gefjon, please separate the PR. I don't think this one can go in, but we'll get it in eventually. Either that or we can just have this be a different version of the API and maintain both and eventually deprecate the one Lightfox is using currently.
let table_name = &*table.get_schema().table_name; | ||
|
||
if !deletes.is_empty() { | ||
let table_name = &table.get_schema().table_name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to the PR, or just a drive-by change? Is it actually an optimization, i.e. is get_schema
expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the caveat that I haven't spent time on this PR in a few days, this does look like a drive-by change, but also trivial code motion... This is a micro optimization, shaving off a hash map lookup, that LLVM is unlikely to elide.. Mostly though, I think I mostly just wanted to move it closer to the usage, for better readability.
let table_name = &*commit_table.get_schema().table_name; | ||
|
||
if !inserts.is_empty() { | ||
let table_name = &commit_table.get_schema().table_name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see #1883 (comment))
@@ -899,7 +899,6 @@ impl<F: FnMut(u64)> spacetimedb_commitlog::payload::txdata::Visitor for ReplayVi | |||
reader: &mut R, | |||
) -> std::result::Result<Self::Row, Self::Error> { | |||
let schema = self.committed_state.schema_for_table(table_id)?; | |||
// TODO: avoid clone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this comment removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an Arc
now, so cloning is cheap and so we don't need to care anymore.
@@ -82,7 +82,7 @@ Requested namespace: {namespace}", | |||
output.into_inner() | |||
} | |||
|
|||
fn generate_table(&self, module: &ModuleDef, namespace: &str, table: &TableDef) -> String { | |||
fn generate_table(&self, _idx: u32, module: &ModuleDef, namespace: &str, table: &TableDef) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does TypeScript not need or use these indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit of the PR was not fully done; TS should also be using the indices. I'll get that done today.
UnknownField { field: String, tables: Vec<Box<str>> }, | ||
UnknownField { field: String, tables: Vec<Arc<str>> }, | ||
#[error("Unknown field name: `{field}` not found in the table(s): `{tables:?}`")] | ||
UnknownFieldName { field: FieldName, tables: Vec<Box<str>> }, | ||
UnknownFieldName { field: FieldName, tables: Vec<Arc<str>> }, | ||
#[error("Field(s): `{fields:?}` not found in the table(s): `{tables:?}`")] | ||
UnknownFields { fields: Vec<String>, tables: Vec<Box<str>> }, | ||
UnknownFields { fields: Vec<String>, tables: Vec<Arc<str>> }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these (and similar changes from Box
to Arc
in this PR) related to the goal of the PR? Could they be split into a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I decided to do this at the point I remembered that we wanted to keep the names in the JSON format but not in the BSATN format. This meant that we would have had to, in most cases, clone e.g., table_name: Box<str>
only to throw them away at the end. That seemed wasteful, so I decided to make these into Arc
s to avoid the actual unnecessary heap allocations. This could be split into a separate PR, but that would make it harder to merge this in time though.
/// Maps table name to a set of callbacks. | ||
table_callbacks: HashMap<&'static str, TableCallbacks<M>>, | ||
table_callbacks: IntMap<u32, TableCallbacks<M>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is no longer correct. It's also not clear what the actual key is, now - is it the table ID received during handshake, or the table index computed at codegen time?
0ff7391
to
7915b10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on the CLI changes that I'm a codeowner for - I did not review crates/cli/src/subcommands/generate/mod.rs
because that diff is more about "code generation" than about "CLI".
👍
This is exercised by the test |
f235f41
to
42b315b
Compare
Description of Changes
Receive and send reducer ids and table ids as opposed to names in the SDK.
TODO:
Fixes https://github.com/clockworklabs/SpacetimeDBPrivate/issues/1091.
API and ABI breaking changes
Yes.