Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement schema checksums for quick incompatibility detection #19

Open
zah opened this issue Feb 22, 2020 · 5 comments
Open

Implement schema checksums for quick incompatibility detection #19

zah opened this issue Feb 22, 2020 · 5 comments
Labels

Comments

@zah
Copy link
Contributor

zah commented Feb 22, 2020

Starting from a designated list of root types, you can use enumAllSerializedFields and Nim's signatureHash to automatically generate a "checksum" that uniquely identifies the version of the type schemas used in a particular build of a project (please note that this will also include all transitively reached types that may appear as record fields).

This checksum can be used in file formats and network protocols to quickly detect situations where you are dealing with an older incompatible version of your software (for formats such as SSZ which don't provide backwards compatibility).

@zah zah added the bounty label Feb 22, 2020
@disruptek
Copy link

That's an awesome idea, how and where does it need to be implemented?

@jangko
Copy link
Contributor

jangko commented Mar 1, 2020

where

Here in this repo. Along with adequate amount of tests to prove it will generate unique ID for different types. Since it will be used to identify things, make sure it can be executed at compile time(e.g. put into a const)

how

You'll need to create a new public API and you'll need to exploit Nim types via macros and compile time procs. From there you turn those types into hashable/checksumable value. Then produce a final "checksum" value.

Update the readme.md, and you're done. Hardest part may be you'll need to fight with the Nim compiler itself and identify it's weakness/limitation regarding this voodo-black-magic feature then report them along with any gotchas you've found.

@disruptek
Copy link

The hope for incremental compilation is that, ultimately, it will be always-on. Whether the types are serialized to sqlite or some other format will be immaterial. We can use this today without any change to the compiler inputs, which I think is particularly attractive. Alternatively, we can reproduce the same functionality that the compiler already has.

I have a feeling you will want to go the latter route. That would, of course, allow you to loosen type validation to, say, validate a type of varchar(20) when stored to a varchar(30) field. I don't know if you want this; it's just the first enhancement that came to mind.

@jangko
Copy link
Contributor

jangko commented Mar 5, 2020

The medium or the destination of those bytes from our serializer will not have any impact to our deserializer. Both serializer and deserializer will analyze their input/output type respectively.

This feature also format independent, whether it is a json-serializer or protobuf-serializer, or msgpack-serializer, or ssz the usage of this feature will be same.

@zah
Copy link
Contributor Author

zah commented Mar 5, 2020

The hope for incremental compilation is that, ultimately, it will be always-on. Whether the types are serialized to sqlite or some other format will be immaterial. We can use this today without any change to the compiler inputs, which I think is particularly attractive. Alternatively, we can reproduce the same functionality that the compiler already has.

I'm not sure I understand the reference to incremental compilation here, but if you imply that this form of signature checksum already exists in the form of signatureHash, please note that the main difference here is that we care only for the serialized fields (these may be a subset of all the fields and they can appear in a different order).

You need to check out the definition and the usages of enumAllSerializedFields. This is the helper mechanism that you can use to process the list of all serialized fields. The schemaHash API will just recursively use this helper to compute the final hash value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants