-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RedPanda consumer example #85
Comments
If we are handling large volume inserts, we can have pre-defined schemas that we cache on each consumer, then we make modifications always nullable (well really everything is nullable). Then after nodes cache the new schema locally (some TTL) then they can accept rows with new columns. We might want a second example for that though. |
Original idea from Alex @ redpanda, then I modified: Since we only need to know if an existing column changed type (columns can come and go, except for partition columns (unless there are defaults)) then we can just hash the schema JSON and if it's different than the one we have in memory (or we don't have it yet) then we can start a TX with something like CRDB/PG/FDB (serializable) where we read and compare whether it's valid, and if the change is valid update the remote schema (or local if remote already knows about it), then insert. This way if we detect a change we go to the golden record and compare/update remote/local and verify the schema. We can do the schema of the whole batch, but this will slow down the entire partition so important to make really high partition count of the gate. Requirements are:
Then quarantine or drop offending rows. |
See #90 , can check if the schema is different. Still need serializable tx in case there are concurrent inserters getting different columns. Probably should still pre-define an initial schema in real life, but can dynamically add columns. |
Example where ingestion is consuming directly from a RedPanda cluster, and batch inserting by some
namespace,table
key that is used as the partition key.Include schema validation and caching in example so that we can guarantee schema issues before insert, and have (dyanmically created) quarantine tables that store the bad (raw) rows when we discover one.
Will ping RP team to ask about high velocity inserts to a given table as that might be bad for RP on a single partition.
The text was updated successfully, but these errors were encountered: