fma: FMA for MT/64-bit Cannon #123

pauldowman · 2024-10-09T13:44:29Z

This is the failure modes analysis for multi-threaded & 64-bit Cannon.

pauldowman · 2024-10-11T17:10:50Z

@Inphi @mbaxter I did rebase and force push this but I won't do that again so that we can share the branch if you want to make edits in here.

Inphi · 2024-12-05T20:01:44Z

@pauldowman could you fill in the "Initial reviewers" and "Need approval from" fields in the table.

mds1 · 2024-12-10T14:20:20Z

security/fma-multi-threaded-64-bit-cannon.md

+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+
+- [Introduction](#introduction)
+- [Failure Modes and Recovery Paths](#failure-modes-and-recovery-paths)


Are there any hard forks or contract upgrades involved? If so, let's make sure to consider any FMA-specific mitigations for the failure modes listed in fma-generic-hardfork.md and fma-generic-contracts.md

There isn't a contract upgrade. The onchain operation for this change requires changing the DisputeGameFactory implementation of cannon and permissioned_cannon. I have a section at the bottom that covers this operation. The other failure modes in the linked documents don't apply here.

mds1 · 2024-12-10T14:22:06Z

security/fma-multi-threaded-64-bit-cannon.md

+
+- **Description:** An incorrectly implemented FPVM could result in an invalid fault proof. This can be caused by bugs in the thread scheduler, incorrect emulation of MIPS64 instructions, and so on.
+- **Risk Assessment:** High severity, Low likelihood.
+- **Mitigations:** Comprehensive testing. This includes full test coverage of every supported MIPS instruction, threading semantics, and verifying op-program execution on live chain data.


For mitigations and detections, we should link to evidence that the mitigation is complete (e.g. a link to the test suite, or a coverage report) and that any monitoring etc. required for detection is stood up. This helps if we ever have to do a retro and revisit this FMA. If we do not yet have the mitigation complete, it should be an action item

Added more details to mitigations where appropriate. Note that some of this references internal links that aren't publicly viewable.

security/fma-multi-threaded-64-bit-cannon.md

BlocksOnAChain · 2024-12-16T13:22:56Z

security/fma-multi-threaded-64-bit-cannon.md

+
+An audit of the multithreaded VM is not required per the [OP Labs Audit Framework](https://gov.optimism.io/t/op-labs-audit-framework-when-to-get-external-security-review-and-how-to-prepare-for-it/6864).
+A failure in the new Cannon VM and thus dispute games is mitigated by an airgap in finalized withdrawals. Furthermore, there's a window whereby the Security Council can override the results of invalid games.
+Nonetheless, we will be auditing the new VM.


@Inphi - we should add audit report links here, once we have the final report from our external audits

@BlocksOnAChain Out of curiosity when you will have the results for the audit of the VM? Did the PR will be merged before the results here?

@Ethnical - We will have the results in January, likely mid January since we will be on a collective pause + auditors need time to generate the reports.
We are still doing reviews with the Auditors, so all of this brings us to January as our target date.
I started adding the findings from the audit to this label, feel free to review what we have, for now.

BlocksOnAChain · 2024-12-16T13:24:14Z

security/fma-multi-threaded-64-bit-cannon.md

+- **Recovery Path(s)**: Reschedule upgrade, possibly releasing new binary though without immediate urgency.
+
+
+## Action Items


@mds1 - any immidiate FMA action items that we should add to the list, after your initial pass for the MT cannon FMA?

Ethnical · 2024-12-17T12:42:43Z

security/fma-multi-threaded-64-bit-cannon.md

+- **Risk Assessment:** High severity, low likelihood.
+- **Mitigations:** We periodically use Cannon to execute the op-program using inputs from op-mainnet and op-sepolia. This periodic cannon runner (vm-runner) runs on oplabs infrastructure.
+Furthermore, we [sanitize](https://github.com/ethereum-optimism/optimism/blob/eabf70498f68f321f5de003f1d443d3e3c8100b8/cannon/Makefile#L51) the op-program [in CI](https://github.com/ethereum-optimism/optimism/blob/eabf70498f68f321f5de003f1d443d3e3c8100b8/.circleci/config.yml#L928C1-L929C111) for unsupported opcodes.
+- **Detection:** Alerting is setup to notify the proofs team whenever the vm-runner fails to complete a cannon run. And the CI check provides an early warning against unsupported opcodes.


Here, it seems that the only early detection is coming from the CI.
This is making me wondering about the potential hole there:

We maintain the CI for a long period of time.

As no one introduce new opcodes this CI test will never be generating fail.

After a certain time, we decide to clean the CI tests because there a taking time to run (and this test is not matching often).

Then a invalid Opcode is introduced.

For me, seems bit light to only rely on the CI here.
Happy to discuss about it.

Discussed offline. Added clarification on the vm-runner being used to detect issues outside of CI.

Ethnical · 2024-12-17T13:10:09Z

@Inphi For the failures that require monitoring like op-dispute-mon for the detection.
I would like to know if a scenario that exhaust or DoS op-program can also DoS the op-dispute-mon.
I am looking for case where we can crash the CHALLENGER and also the monitoring, thus making the game resolving incorrectly but impossible to detect and thus allowing an malicious actor to exploit the bridge.
In other words is there is part of the op-program code shared with op-dispute-mon.
I am thinking about case like Insufficient memory in the program or the Unimplemented syscalls or opcodes needed by op-program.

Ethnical · 2024-12-17T13:18:32Z

security/fma-multi-threaded-64-bit-cannon.md

+- **Description:** This could theoretically occur when the op-program runs out of memory in a way that lets the attacker reuse code to subvert execution.
+- **Risk Assessment:** High severity, low likelihood.
+  - Low likelihood: This requires an attacker to craft inputs that not only induce high memory usage, but also corrupt or spray the heap in a way that either produces invalid fault proofs or prevents valid fault proofs from being generated.
+- **Mitigations:** As with [Insufficient memory in the program](#insufficient-memory-in-the-program), the 64-bit address space effectively prevents this from occurring. Furthermore, the Go runtime checks memory allocations against heap corruption. However, such memory protections may not hold due to bugs in the Go runtime.


NIT: Maybe add that not only vulnerability inside Go can cause this behavior but also the usage of unsafe in Go can lead to unexpected behavior.
PS: We should also ensure that no unsafe package is imported and used incorrectly in the current codebase.

Ethnical · 2024-12-17T13:28:46Z

security/fma-multi-threaded-64-bit-cannon.md

+
+- **Description:** The off-chain Cannon [attempts to run the correct VM version based on the prestate input](https://github.com/ethereum-optimism/design-docs/blob/0034943e42b8ab5f9dd9ded2ef2b6b55359c922c/cannon-state-versioning.md). If it doesn't work correctly the on-chain steps would not match.
+- **Risk Assessment:** Medium severity, low likelihood.
+- **Mitigations:** Multicannon mitigates this issue by embedding a variety of cannon STFs into a single binary. This shifts the concern of ensuring the correct VM selection to multicannon. We also run multicannon on oplabs infra via the vm-runner, to assert the multicannon binary was built correctly.


Can we details more about the "STFs" meaning here?
And why the multicannon is ensuring the program will not occured here?

Also, for the case of the an invalid prestate is deployed on L1 by a mistake (For example: by updating the prestate with the ASR in case of Incident response with an incorrect game that is blacklisted etc..) is this case also make sense here or
should we add a new failure or this is part of the failure?

I've added a new failure mode for invalid prestates.

Ethnical · 2024-12-17T13:53:10Z

security/fma-multi-threaded-64-bit-cannon.md

+
+### Invalid `DisputeGameFactory.setImplementation` execution
+
+- Description: This occurs when either the call to the DisputeGameFactory could not be made due to grossly unfavorable base fees on L1, an invalidly approved safe nonce, or a successful execution to a misconfigured dispute game implementation.


This will invalid the withdrawals that are currently into the current windows right?
If yes, we should inform the reader here imo.
To make sure, this will also invalid some withdrawals of users. They will required to resubmit them. However, that can be expensive in case of gas spike.

Or this won't call setRespectedGame? and this only adding a new implementation into the mapping?

Added context clarifying this.

Ethnical · 2024-12-17T14:00:50Z

security/fma-multi-threaded-64-bit-cannon.md

+  - Low Likelihood: The low likelihood is a result of tenderly simulation testing of safe transactions, code review of the upgrade playbook, and manual review of the dispute game implementations (which are deployed on mainnet and specified in the governance proposal so they may be reviewed).
+  - Low severity: Fault Proofs continues to use the existing single-threaded FPVM. This carries a reputational risk, but it doesn't diminish the security of the system. Withdrawals will continue to work against outputs secured by the single-threaded FPVM.
+- **Mitigations:** No immediate action is needed other than to retry the safe transaction. This may require another signing ceremony. Note that the op-challenger does not need to be rolled back, as multicannon is backwards compatible with older FPVM state transition functions.
+- **Detection:** An un-executed safe transaction is easily detectable.


An un-executed safe transaction is easily detectable.

Agreed in the case of the revert transaction as the UI of the execution from superchain-ops will show the issue.

or a successful execution to a misconfigured dispute game implementation.

However, the a successful execution to a misconfigured dispute game implementation. How to detect a misconfigured dispute game deployed?
I think we should elaborate more on it here.

In the "low likelihood" bullet point, I note the various ways this would be detected. Since the game implementations are pre-deployed prior to the upgrade, a reviewer can check and detect any invalid configuration. Does that address your comment?

fma: Draft FMA for MT/64-bit Cannon

eded6d0

pauldowman force-pushed the pauldowman/cannon-fma branch from 117c0bd to eded6d0 Compare October 11, 2024 17:08

pauldowman assigned pauldowman, mbaxter and Inphi Oct 11, 2024

pauldowman mentioned this pull request Oct 11, 2024

FMA security document for MT cannon ethereum-optimism/optimism#12311

Open

add some flesh

db0efd3

Inphi force-pushed the pauldowman/cannon-fma branch from 417ef09 to db0efd3 Compare November 26, 2024 21:35

Inphi added 3 commits December 5, 2024 14:39

add a couple more items; fix grammar

e0917f4

fix doctoc

1b94afc

add meta; doc upgrade failure analysis

cae5517

Inphi marked this pull request as ready for review December 5, 2024 19:59

fix nits

b0ed6f7

mds1 reviewed Dec 10, 2024

View reviewed changes

security/fma-multi-threaded-64-bit-cannon.md Show resolved Hide resolved

Inphi added 5 commits December 10, 2024 11:16

add section on cfi

e5fdf88

update title

f5808bf

add more details on mitigations

6cb2f7f

add note on dispute-mon monitoring

80c622e

update fma status

8cd4380

Inphi changed the title ~~fma: Draft FMA for MT/64-bit Cannon~~ fma: FMA for MT/64-bit Cannon Dec 10, 2024

BlocksOnAChain reviewed Dec 16, 2024

View reviewed changes

Ethnical reviewed Dec 17, 2024

View reviewed changes

Inphi added 2 commits December 17, 2024 09:40

clarify vm-runner input sampling

3f39641

review comments

1021899

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fma: FMA for MT/64-bit Cannon #123

fma: FMA for MT/64-bit Cannon #123

pauldowman commented Oct 9, 2024

pauldowman commented Oct 11, 2024

Inphi commented Dec 5, 2024

mds1 Dec 10, 2024

Inphi Dec 10, 2024

mds1 Dec 10, 2024

Inphi Dec 10, 2024

BlocksOnAChain Dec 16, 2024 •

edited

Loading

Ethnical Dec 17, 2024

BlocksOnAChain Dec 17, 2024 •

edited

Loading

BlocksOnAChain Dec 16, 2024

Ethnical Dec 17, 2024

Inphi Dec 17, 2024

Ethnical commented Dec 17, 2024 •

edited

Loading

Ethnical Dec 17, 2024

Ethnical Dec 17, 2024

Inphi Dec 17, 2024

Ethnical Dec 17, 2024 •

edited

Loading

Inphi Dec 17, 2024

Ethnical Dec 17, 2024

Inphi Dec 17, 2024

		- Recovery Path(s): Reschedule upgrade, possibly releasing new binary though without immediate urgency.


		## Action Items


		### Invalid `DisputeGameFactory.setImplementation` execution

		- Description: This occurs when either the call to the DisputeGameFactory could not be made due to grossly unfavorable base fees on L1, an invalidly approved safe nonce, or a successful execution to a misconfigured dispute game implementation.

fma: FMA for MT/64-bit Cannon #123

Are you sure you want to change the base?

fma: FMA for MT/64-bit Cannon #123

Conversation

pauldowman commented Oct 9, 2024

pauldowman commented Oct 11, 2024

Inphi commented Dec 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlocksOnAChain Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlocksOnAChain Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ethnical commented Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ethnical Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlocksOnAChain Dec 16, 2024 •

edited

Loading

BlocksOnAChain Dec 17, 2024 •

edited

Loading

Ethnical commented Dec 17, 2024 •

edited

Loading

Ethnical Dec 17, 2024 •

edited

Loading