-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block-Version-based Protocol Evolution #26
Block-Version-based Protocol Evolution #26
Conversation
e6b70d5
to
8224744
Compare
8224744
to
496f629
Compare
2. If they submit a transaction spending the TxOut, the merkle proof of membership | ||
may be wrong if the hash doesn't incorporate the new field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good criticism of the dual-rule blocks approach, that I sadly did not foresee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@garbageslam What if membership proofs just hashed the core TxOut components? I.e. the onetime address and amount commitment. Then you can change memo fields arbitrarily without old clients needing to know about it when constructing a membership proof for the old enclave in a 'dual-rule' block. At the very least, you'd only need to do single-rule transitions on the rare occasion that core TxOut components change (if ever).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that would definitely work for the fog clients, and like, it would make the oblivious merkle proof server easier to build because there's less data per TxOut. So I would like us to do that
I think it still may break clients that are syncing the whole ledger, because if they are on an old block version, and TxOut's appear that have new fields, they may compute the wrong block id and then reject the block. So that would instantly force those older clients to update or be broken. So I don't think that's a complete solution unfortunately, otherwise yeah it would have been my preference to do it that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good criticism of the dual-rule blocks approach, that I sadly did not foresee.
Yeah I didn't really foresee this either. I saw pieces of it but not the whole thing. I think James saw the problem a while ago and it wasn't until now that he came up with a clear plan to fix it. This proposal is mostly my understanding of James' idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least, you'd only need to do single-rule transitions on the rare occasion that core TxOut components change (if ever).
I think the simplest way to make this work without all the clients being aware of multiple rules would have been if:
- Instead of mc-crypto-digestible, we hash raw protobuf payloads, and clients leave all the fields in there for this hash even if they don't recognize some of them.
- Because protobuf serializations are not canonical, we would have to either make a "canonical protobuf" serializer, or, a routine that takes protobufs and makes them canonical.
It's a little late in the game for that. Making a protobuf serializer is really complicated, and we didn't really have the resources to try to do that, esp. that would support multiple programming languages, when we had only a few engineers. The mc-crypto-digestible
thing was a lot easier to put together. We could also have tried to make a routine that takes serialized protobuf payloads and reorders the fields to make them canonical, but that also would be pretty complicated.
I guess we could also have tried to use libra canonical serialization or something like that, but I don't think that existed when we were starting, and I'm not sure if they had good support for rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it still may break clients that are syncing the whole ledger, because if they are on an old block version, and TxOut's appear that have new fields, they may compute the wrong block id and then reject the block. So that would instantly force those older clients to update or be broken. So I don't think that's a complete solution unfortunately, otherwise yeah it would have been my preference to do it that way
Yeah anyone syncing the ledger needs a full and updated codebase. I think it would be wasted effort to try to make universal backward compatibility of old clients across forks.
The goals of dual-rule blocks are: 1) prevent txs from failing if they are submitted right before a transition point, 2) let 'simple' codebases like wallets only worry about one enclave at a time with loose upgrade requirements around forks (i.e. wallets that aren't syncing the entire chain). Even simple wallets will need some cross-fork code since all output formats always need to be handled (in order to recover funds properly), so it actually might be fine to require pre-fork support for new output types (even if pre-fork support of new tx builders/enclaves isn't ready).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah anyone syncing the ledger needs a full and updated codebase. I think it would be wasted effort to try to make universal backward compatibility of old clients across forks.
I think the model we want to have is like,
- Users are expected to update every X days. If you have a phone app, it will prompt you to approve updates periodically. If you are e.g. running a point-of-sale terminal, you may have to do scheduled maintenance. This is all normal.
- Users expect that as long as they do the upgrades on schedule, the system always works.
Suppose that e.g. a grocery store decided to do a trial run of MOB point-of-sale terminals, which are syncing the ledger probably, using full-service or mobilecoind let's say.
If the terminal is old and the customer has new versions of our software, and the terminal breaks (because it can't spend a TxOut with a new field), that's not a good experience for the merchant. Perhaps they can fix it by doing a update but it's not the experience they have with VISA for instance. However, if they update aggressively (and before their customers), then their customers are the ones having the unspendable TxOut's, if they have to process a refund or something. So it's a bit of a catch 22, someone is going to have a broken transaction and have to suddenly do an update faster than expected.
I think we'd really like to be able to ship clients that are guaranteed to work for their life-time and not break them prematurely, no matter if they are desktop wallets or mobile wallets. I don't personally think this is wasted effort, as long as we figure out a system that works and follow it, then everyone will have a good user experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd really like to be able to ship clients that are guaranteed to work for their life-time and not break them prematurely, no matter if they are desktop wallets or mobile wallets.
This can only be achieved for scanning the ledger for owned outputs (bare-minimum balance-checking where the only things you look at are onetime address, amount commitments, encoded amounts, output pubkeys, and key images), assuming you don't transition to new crypto for those pieces (e.g. Seraphis). Spending to a new enclave always requires an updated client, since you need to sign all tx data which means knowing all the current formats/etc. (at minimum a new MRENCLAVE, if no tx-related things change). Reading miscellaneous new memos also always requires an updated client (obviously).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but if we have a policy like, we distribute client updates every X
days that trust both current MRENCLAVE and next MRENCLAVE, and only after X
days do we deploy the new enclaves, we can also navigate that without ever breaking clients that are less than X
days old.
Part of what this block-version thing does is, it means we can do one enclave update / release to update all readers, and then at runtime trigger the new behavior to actually start happening, so we don't need a second round of updates to accomplish that, so that we can ship things faster this way, than if we only had clients that were capable of one behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I think that for coordinating block_version bumps, there are a few options that I can think of, all not terribly complicated to implement:
- Command line configuration flag to the nodes that gets hashed into the responder id, similar to how we coordinate overriding the minimum fee
- Adding a new transaction type to SCP - we have a POC for that (although in terms of code, this is the most complicated approach)
- Deciding in advance on a date where the change happens, and have the nodes switch automatically at the given date/time. This is a bit janky since there is no way to ensure all nodes clocks are perfectly synced, so this will likely need to involve responder id change or something similar to that as well (to prevent nodes using the old block version from talking to nodes that have switched to the new block version).
Either way, I am confident that this part is a solvable problem.
Thanks for writing this MCIP!
|
||
This proposal relates to several earlier MCIP proposals: | ||
|
||
* 0007-Protocol-version-statement-type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal relates to several earlier MCIP proposals: | ||
|
||
* 0007-Protocol-version-statement-type | ||
* 0008-SCP-based-hard-forks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
* 0007-Protocol-version-statement-type | ||
* 0008-SCP-based-hard-forks | ||
* 0009-Dual-rule-blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was kind of leaning towards this, as it seems like "do a coordinated restart" w/o enclave changes is the easiest way to make this go. That being said, I don't think there's a danger of "some nodes externalize v1, some nodes externalize v2", as they wouldn't be able to reach consensus on it until there's K/N agreement on what version things are writing. One caveat is there will be a tombstone block age-out for v2 transactions submitted to v2 nodes before there's a quorum of v2 nodes (and similarly for v1 transactions to v1 nodes before the restart is complete). One interesting aside to this is that we'll need to plumb the block version into something like That being said,
isn't entirely outside the scope, given that we're already looking to implement something like this for other purposes. |
So like, one benefit of being able to programmatically trigger this would be, we could modify CD so that e.g. it deploys a network at block version 1, then run test client, then if it passes, trigger upgrade to block version 2, then run test client again and see if it passes. That would be kind of nice for testing that the upgrades don't break things pretty early in the development pipeline Maybe it's still possible if we like, nuke the network and rebuild images and redeploy? |
to more properly describe the alternative roll out for encrypted memos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Very well written and clear to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very well thought out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* We propose that the `masked_token_id` field from the `0023-confidential-token-ids` MCIP should be associated to a bump to `block_version = 3`. | ||
That is, consensus will reject `Tx` that contain any `TxOut` that have this field until `block_version = 3`. | ||
* We propose that `block_version = 3` may be triggered by the acceptance of the first minting transaction to the blockchain. | ||
* We do not at this point have a concrete proposal for how to trigger `block_version = 2`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if v2 could be a switch that is manually triggered from one of our internal UIs, then we use SCP to converge on that version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is basically the same as 0007-protocol-version-statement
proposal right? If SCP is convering on this, most likely that corresponds to an SCP statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so? I'm imagining an internal UI with a button that kicks off that SCP statement.
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
This adds complexity to the transaction validation code, which now must support multiple versions of the rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems inevitable with any approach to backwards-compatibility. Maybe note that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk, I guess my thinking is:
- In the "initial" plans for how to do versioning, we didn't plan on doing this, there would just be one version for transaction core lib.
- We also planned to roll out the memos that way
- I'm arguing that we have to accept this complexity to avoid breaking the clients sometimes in this MCIP but I'm trying to treat that as a position of the MCIP rather than state it as a fact, and the reader can draw their conclusions
3. Even if they are a fog client, and (new) fog is building the proof of membership for them, | ||
if the client strips out the fields it doesn't understand that fog gives them, then consensus | ||
will likely reject the transaction. (There are some protobuf implementations that support "preserving" | ||
unknown fields, but PROST, which is the only rust implementation which is enclave-compatible, does not do this.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it makes sense for us to invest in making Prost support preserving unknown fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess at this point we've kind of designed everything around the idea that that doesn't happen. But maybe there's some use-case for that in the future, idk
Co-authored-by: Remoun Metyas <[email protected]>
A proposal to use
block_version
to control protocol evolutionRendered MCIP