-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding cassandra key value storage #961
base: main
Are you sure you want to change the base?
Conversation
fb27567
to
a230e1b
Compare
}) | ||
} | ||
|
||
pub async fn create_docker_schema(&self) -> Result<(), String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this docker specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is.
I guess the user will have installed Cassandra, this is just for test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok to only run this in tests, but that does not make it docker specific, this is code that creates the schema in any cassandra connection.
Also it is key-value-store specific, so we should have it in the keyvalue/cassandra module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, for now it is just the kv implementation that is done, that's why we have kv specific, but we'll add indexed and blob too.
I think we need to move, for now, cassandra.rs from top level golem-executor-base to storage as we have done for sqlite_types.rs, which contains all sql related statement for the 3 types of storage.
self.session | ||
.query_unpaged( | ||
Query::new(format!( | ||
r#" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can these be either indented as the current source is indented (i know that these are not auto formatted, but we can keep them in place manually), or start them at zero indent?
} | ||
|
||
pub fn with(&self, svc_name: &'static str, api_name: &'static str) -> CassandraLabelledApi { | ||
CassandraLabelledApi { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intended relation between CassandraSession, CassandraLabelledApi, CassandraKeyValueStorage?
iIm a bit confused, as the CassandraSession and CassandraLabelledApi suggest some generic functionality, but AFAIU all the specific key values storage implementation is in CassandraLabelledApi, which wraps a CassandraSession, which contains the specific key-value schema, and CassandraKeyValueStorage is calling CassandraSession, and wrapping it before every call into a CassandraLabelledApi.
Maybe I'm missing something, but
- I see no reason for a separate CassandraSession, the schema is tied to CassandraKeyValueStorage
- I think the work done by CassandraLabelledApi could be a generic wrapper method in CassandraKeyValueStorage, instead of duplicating all the API
- and I think the queries should live in CassandraKeyValueStorage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I do see now that we have in RedisLabbelled in golem-common, but that is not "IndexedKeyValue" level, rather a generic redis wrapper (ofc because redis is a key value too in general, the two layer matches somewhat). I think we can have CassandraLabelled, but that should be in common, and on the level of cassandra operations (e.g. execute query with the common labels and a custom "query-name" label)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vigoo wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem to move CassandraLabelled into common is that we are going to have new dependencies in CLI, which I remember we want to avoid, that's why I kept everything in executor-worker modules until the refactor comes to be able to move all common stuff into common without impacting CLI or others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, we also have service base and / or we can use features.
which refactor is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The redis code in common adds metrics and logging to the redis client and it just directly wraps the redis operations, independent of any actual use case.
I see that in this top-level cassandra module there is key-value store specific queries, which is definitely not something we want. Everything key-value store specific should be in the storage/keyvalue/cassandra
module.
If you want to have common metrics and logging for cassandra, like we have for redis, then you can wrap the cassandra library but that should just instrument (wrap) the library's functions and not add any logic on top if it.
Where this "common" cassandra wrapper is is another question, top-level worker-executor-base is definitely not a good place for it. Until we need it in any other service, we can keep it in worker-executor-base, but let's move it to the storage module at least.
887abdb
to
1a13dd2
Compare
set_tracing: bool, | ||
} | ||
|
||
impl CassandraSession { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this struct and impl is still specific to our key value store schema, and not a generic cassandra session. anything that is using the kv_store table etc. tables should be named and placed as part of the KVStore implementation, and only things that are using generic cassandra primitives should be named as such cassandra
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as I tried to explain yesterday, this is a common struct as CassandraLabelledApi, which will contain all the kv and indexed storage queries, tables, and functions.
This PR is KV related, and I have the Indexed related changes for this file and other files.
Once this PR is approved I'll create the Indexed storage PR, probably you will see what I mean at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we are trying to ask is to follow the structure of how it was done for Redis:
- One layer wraps the 3rd party library and adds metrics and logging - without adding any higher level operations (no actual queries, just the same operations provided by the 3rd party library, wrapped)
- KV store implementation built on this, only containing KV store related queries
- Indexed store implementations built on this, only containing indexed store related queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, probably we can create another ticket to fix sqlite implementation as it is done in the same way as Cassandra.
There is a struct like this
SqliteLabelledApi { svc_name: &'static str, api_name: &'static str, pool: SqlitePoolx, }
to avoid passing common parameters like svc_name
& api_name
for every method to the metrics and logging functionality, so this Labelled is holding the 3 storage kind of queries for simplicity, but it seems we are going to change that in Cassandra, then I think we need to change also in sqlite.
WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the sqlite one should also use similar naming and structuring
bc07cf9
to
58d07f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some notes about the cassandra layzness and for creating followup issues
@@ -54,6 +56,7 @@ pub trait TestDependencies { | |||
self.rdb().kill(); | |||
self.redis_monitor().kill(); | |||
self.redis().kill(); | |||
self.cassandra().kill(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that cassandra is lazy, will this boot up cassandra just to kill it?
if so, i think we should use RwMutex Option instead of lazy cell
}) | ||
} | ||
|
||
pub async fn create_schema(&self) -> Result<(), String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would still move the (test) schema creations out of the common cassandra (and sqlite) code, and even separate them by usage (for sqlite), but if we create an issue for it, then i'm okay with it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is only one common schema for the storage, to where you want to move the schema creation ? ,
how you want to separate them ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on the storage type: kv, blob, indexed; and move to that package.
with that ideally we can select separately what to use, and only install what is needed (even if it is only for tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense, because I don't think we will never use kv storage with sqlite and indexed storage in Cassandra or some approach like that, and I don't see neither any advantage in performance or space usage or something useful, but I did it because I think we need to move on with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The three storage "types" (kv, indexed, blob) should be completely separate and separately configurable. Even though sqlite + cassandra for example is not a likely combination, it is possible that we will have more backend implementations in the future that make sense to combine them in ways we don't see right now. In general, using something different for kv-store (basically caching data) and indexed-store (our primary storage layer for durable execution) completely makes sense to me.
9e24fc4
to
a616ae9
Compare
No description provided.