Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block-Version-based Protocol Evolution #26

Conversation

cbeck88
Copy link
Contributor

@cbeck88 cbeck88 commented Feb 6, 2022

A proposal to use block_version to control protocol evolution

Rendered MCIP

@cbeck88 cbeck88 force-pushed the block-version-based-protocol-evolution branch from e6b70d5 to 8224744 Compare February 6, 2022 03:32
@cbeck88 cbeck88 force-pushed the block-version-based-protocol-evolution branch from 8224744 to 496f629 Compare February 6, 2022 03:42
@cbeck88 cbeck88 mentioned this pull request Feb 6, 2022
Comment on lines +45 to +46
2. If they submit a transaction spending the TxOut, the merkle proof of membership
may be wrong if the hash doesn't incorporate the new field.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good criticism of the dual-rule blocks approach, that I sadly did not foresee.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@garbageslam What if membership proofs just hashed the core TxOut components? I.e. the onetime address and amount commitment. Then you can change memo fields arbitrarily without old clients needing to know about it when constructing a membership proof for the old enclave in a 'dual-rule' block. At the very least, you'd only need to do single-rule transitions on the rare occasion that core TxOut components change (if ever).

Copy link
Contributor Author

@cbeck88 cbeck88 Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that would definitely work for the fog clients, and like, it would make the oblivious merkle proof server easier to build because there's less data per TxOut. So I would like us to do that

I think it still may break clients that are syncing the whole ledger, because if they are on an old block version, and TxOut's appear that have new fields, they may compute the wrong block id and then reject the block. So that would instantly force those older clients to update or be broken. So I don't think that's a complete solution unfortunately, otherwise yeah it would have been my preference to do it that way

Copy link
Contributor Author

@cbeck88 cbeck88 Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good criticism of the dual-rule blocks approach, that I sadly did not foresee.

Yeah I didn't really foresee this either. I saw pieces of it but not the whole thing. I think James saw the problem a while ago and it wasn't until now that he came up with a clear plan to fix it. This proposal is mostly my understanding of James' idea

Copy link
Contributor Author

@cbeck88 cbeck88 Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least, you'd only need to do single-rule transitions on the rare occasion that core TxOut components change (if ever).

I think the simplest way to make this work without all the clients being aware of multiple rules would have been if:

  • Instead of mc-crypto-digestible, we hash raw protobuf payloads, and clients leave all the fields in there for this hash even if they don't recognize some of them.
  • Because protobuf serializations are not canonical, we would have to either make a "canonical protobuf" serializer, or, a routine that takes protobufs and makes them canonical.

It's a little late in the game for that. Making a protobuf serializer is really complicated, and we didn't really have the resources to try to do that, esp. that would support multiple programming languages, when we had only a few engineers. The mc-crypto-digestible thing was a lot easier to put together. We could also have tried to make a routine that takes serialized protobuf payloads and reorders the fields to make them canonical, but that also would be pretty complicated.

I guess we could also have tried to use libra canonical serialization or something like that, but I don't think that existed when we were starting, and I'm not sure if they had good support for rust.

Copy link

@UkoeHB UkoeHB Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it still may break clients that are syncing the whole ledger, because if they are on an old block version, and TxOut's appear that have new fields, they may compute the wrong block id and then reject the block. So that would instantly force those older clients to update or be broken. So I don't think that's a complete solution unfortunately, otherwise yeah it would have been my preference to do it that way

Yeah anyone syncing the ledger needs a full and updated codebase. I think it would be wasted effort to try to make universal backward compatibility of old clients across forks.

The goals of dual-rule blocks are: 1) prevent txs from failing if they are submitted right before a transition point, 2) let 'simple' codebases like wallets only worry about one enclave at a time with loose upgrade requirements around forks (i.e. wallets that aren't syncing the entire chain). Even simple wallets will need some cross-fork code since all output formats always need to be handled (in order to recover funds properly), so it actually might be fine to require pre-fork support for new output types (even if pre-fork support of new tx builders/enclaves isn't ready).

Copy link
Contributor Author

@cbeck88 cbeck88 Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah anyone syncing the ledger needs a full and updated codebase. I think it would be wasted effort to try to make universal backward compatibility of old clients across forks.

I think the model we want to have is like,

  • Users are expected to update every X days. If you have a phone app, it will prompt you to approve updates periodically. If you are e.g. running a point-of-sale terminal, you may have to do scheduled maintenance. This is all normal.
  • Users expect that as long as they do the upgrades on schedule, the system always works.

Suppose that e.g. a grocery store decided to do a trial run of MOB point-of-sale terminals, which are syncing the ledger probably, using full-service or mobilecoind let's say.

If the terminal is old and the customer has new versions of our software, and the terminal breaks (because it can't spend a TxOut with a new field), that's not a good experience for the merchant. Perhaps they can fix it by doing a update but it's not the experience they have with VISA for instance. However, if they update aggressively (and before their customers), then their customers are the ones having the unspendable TxOut's, if they have to process a refund or something. So it's a bit of a catch 22, someone is going to have a broken transaction and have to suddenly do an update faster than expected.

I think we'd really like to be able to ship clients that are guaranteed to work for their life-time and not break them prematurely, no matter if they are desktop wallets or mobile wallets. I don't personally think this is wasted effort, as long as we figure out a system that works and follow it, then everyone will have a good user experience.

Copy link

@UkoeHB UkoeHB Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd really like to be able to ship clients that are guaranteed to work for their life-time and not break them prematurely, no matter if they are desktop wallets or mobile wallets.

This can only be achieved for scanning the ledger for owned outputs (bare-minimum balance-checking where the only things you look at are onetime address, amount commitments, encoded amounts, output pubkeys, and key images), assuming you don't transition to new crypto for those pieces (e.g. Seraphis). Spending to a new enclave always requires an updated client, since you need to sign all tx data which means knowing all the current formats/etc. (at minimum a new MRENCLAVE, if no tx-related things change). Reading miscellaneous new memos also always requires an updated client (obviously).

Copy link
Contributor Author

@cbeck88 cbeck88 Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but if we have a policy like, we distribute client updates every X days that trust both current MRENCLAVE and next MRENCLAVE, and only after X days do we deploy the new enclaves, we can also navigate that without ever breaking clients that are less than X days old.

Part of what this block-version thing does is, it means we can do one enclave update / release to update all readers, and then at runtime trigger the new behavior to actually start happening, so we don't need a second round of updates to accomplish that, so that we can ship things faster this way, than if we only had clients that were capable of one behavior.

Copy link
Contributor

@eranrund eranrund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I think that for coordinating block_version bumps, there are a few options that I can think of, all not terribly complicated to implement:

  1. Command line configuration flag to the nodes that gets hashed into the responder id, similar to how we coordinate overriding the minimum fee
  2. Adding a new transaction type to SCP - we have a POC for that (although in terms of code, this is the most complicated approach)
  3. Deciding in advance on a date where the change happens, and have the nodes switch automatically at the given date/time. This is a bit janky since there is no way to ensure all nodes clocks are perfectly synced, so this will likely need to involve responder id change or something similar to that as well (to prevent nodes using the old block version from talking to nodes that have switched to the new block version).
    Either way, I am confident that this part is a solvable problem.
    Thanks for writing this MCIP!


This proposal relates to several earlier MCIP proposals:

* 0007-Protocol-version-statement-type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#7

This proposal relates to several earlier MCIP proposals:

* 0007-Protocol-version-statement-type
* 0008-SCP-based-hard-forks
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#8


* 0007-Protocol-version-statement-type
* 0008-SCP-based-hard-forks
* 0009-Dual-rule-blocks
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#9

@jcape
Copy link
Contributor

jcape commented Feb 8, 2022

LGTM. I think that for coordinating block_version bumps, there are a few options that I can think of, all not terribly complicated to implement:

  1. Command line configuration flag to the nodes that gets hashed into the responder id, similar to how we coordinate overriding the minimum fee

I was kind of leaning towards this, as it seems like "do a coordinated restart" w/o enclave changes is the easiest way to make this go. That being said, I don't think there's a danger of "some nodes externalize v1, some nodes externalize v2", as they wouldn't be able to reach consensus on it until there's K/N agreement on what version things are writing.

One caveat is there will be a tombstone block age-out for v2 transactions submitted to v2 nodes before there's a quorum of v2 nodes (and similarly for v1 transactions to v1 nodes before the restart is complete).

One interesting aside to this is that we'll need to plumb the block version into something like GetLastBlockInfo for non-fog clients anyways, so while adding it to fog would be nice, it may be unnecessary.

That being said,

  1. Adding a new transaction type to SCP - we have a POC for that (although in terms of code, this is the most complicated approach)

isn't entirely outside the scope, given that we're already looking to implement something like this for other purposes.

@cbeck88
Copy link
Contributor Author

cbeck88 commented Feb 8, 2022

So like, one benefit of being able to programmatically trigger this would be, we could modify CD so that e.g. it deploys a network at block version 1, then run test client, then if it passes, trigger upgrade to block version 2, then run test client again and see if it passes. That would be kind of nice for testing that the upgrades don't break things pretty early in the development pipeline

Maybe it's still possible if we like, nuke the network and rebuild images and redeploy?

to more properly describe the alternative roll out for encrypted memos
@samdealy samdealy self-requested a review February 11, 2022 21:59
Copy link

@samdealy samdealy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Very well written and clear to me!

Copy link

@mfaulk mfaulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very well thought out!

Copy link
Contributor

@remoun remoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* We propose that the `masked_token_id` field from the `0023-confidential-token-ids` MCIP should be associated to a bump to `block_version = 3`.
That is, consensus will reject `Tx` that contain any `TxOut` that have this field until `block_version = 3`.
* We propose that `block_version = 3` may be triggered by the acceptance of the first minting transaction to the blockchain.
* We do not at this point have a concrete proposal for how to trigger `block_version = 2`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if v2 could be a switch that is manually triggered from one of our internal UIs, then we use SCP to converge on that version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is basically the same as 0007-protocol-version-statement proposal right? If SCP is convering on this, most likely that corresponds to an SCP statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so? I'm imagining an internal UI with a button that kicks off that SCP statement.

# Drawbacks
[drawbacks]: #drawbacks

This adds complexity to the transaction validation code, which now must support multiple versions of the rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems inevitable with any approach to backwards-compatibility. Maybe note that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk, I guess my thinking is:

  • In the "initial" plans for how to do versioning, we didn't plan on doing this, there would just be one version for transaction core lib.
  • We also planned to roll out the memos that way
  • I'm arguing that we have to accept this complexity to avoid breaking the clients sometimes in this MCIP but I'm trying to treat that as a position of the MCIP rather than state it as a fact, and the reader can draw their conclusions

3. Even if they are a fog client, and (new) fog is building the proof of membership for them,
if the client strips out the fields it doesn't understand that fog gives them, then consensus
will likely reject the transaction. (There are some protobuf implementations that support "preserving"
unknown fields, but PROST, which is the only rust implementation which is enclave-compatible, does not do this.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense for us to invest in making Prost support preserving unknown fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess at this point we've kind of designed everything around the idea that that doesn't happen. But maybe there's some use-case for that in the future, idk

text/0026-block-version-based-protocol-evolution.md Outdated Show resolved Hide resolved
@cbeck88 cbeck88 added final-comment-period This RFC is in the final-comment period for the next 24h T-consensus Consensus Team labels Feb 14, 2022
@cbeck88 cbeck88 merged commit 3484509 into mobilecoinfoundation:main Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period This RFC is in the final-comment period for the next 24h T-consensus Consensus Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants