Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional CU metering #1

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
78 changes: 78 additions & 0 deletions proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
simd: 'XXXX'
title: Conditional CU metering
authors:
- Tao Zhu (Anza)
category: Standard
type: Core
status: Draft
created: 2024-MM-DD
feature:
supersedes:
superseded-by:
extends:
---

## Summary

Adjusting how CU consumption is measured based on the conditions of transaction execution: successful completion will consume actual CUs, but certain irregular failures will result in the transaction automatically consuming all requested CUs.

Check failure on line 18 in proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md

View workflow job for this annotation

GitHub Actions / Markdown Linter

Line length [Expected: 80; Actual: 242]

proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md:18:81 MD013/line-length Line length [Expected: 80; Actual: 242]

## Motivation

### Background:

In the Solana protocol, tracking transaction Compute Unit (CU) consumption is a critical aspect of maintaining consensus. Block costs are part of this consensus, meaning that all clients must agree on the execution cost of each transaction, including those that error out during execution. Ensuring consistency in CU tracking across clients is essential for maintaining protocol integrity.

Check failure on line 24 in proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md

View workflow job for this annotation

GitHub Actions / Markdown Linter

Line length [Expected: 80; Actual: 389]

proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md:24:81 MD013/line-length Line length [Expected: 80; Actual: 389]

### Proposed Change:

To improve performance, Solana programs are often compiled with a JIT that works at the level of Basic Blocks — linear sequences of sBPF instructions with a single entry and exit point, and no loops or branches. Basic Blocks allow for efficient execution by reducing the overhead associated with tracking CU consumption for each individual BPF instruction.

Check failure on line 28 in proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md

View workflow job for this annotation

GitHub Actions / Markdown Linter

Line length [Expected: 80; Actual: 356]

proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md:28:81 MD013/line-length Line length [Expected: 80; Actual: 356]

Other than in rare, exceptional situations discussed below, the total CU consumption for a Basic Block is deterministic and, and CU accounting can be done once per basic block instead of at each instruction.

Check failure on line 30 in proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md

View workflow job for this annotation

GitHub Actions / Markdown Linter

Line length [Expected: 80; Actual: 207]

proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md:30:81 MD013/line-length Line length [Expected: 80; Actual: 207]
A transaction completing successfully or with most errors implies that execution exited each basic block at its single exit point,

Check failure on line 31 in proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md

View workflow job for this annotation

GitHub Actions / Markdown Linter

Line length [Expected: 80; Actual: 131]

proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md:31:81 MD013/line-length Line length [Expected: 80; Actual: 131]
and thus that the total CU consumption of the execution is equal to the sum of the CU cost of each Basic Block executed.

Check failure on line 32 in proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md

View workflow job for this annotation

GitHub Actions / Markdown Linter

Line length [Expected: 80; Actual: 120]

proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md:32:81 MD013/line-length Line length [Expected: 80; Actual: 120]

However, when an exception is thrown during the execution of a Basic Block (e.g., a null memory dereference or other faults), determining the exact number of CUs consumed up to the point of failure requires additional effort. For instance, the Agave client implements a mechanism that tracks the Instruction Pointer (IP) or Program Counter (PC) to backtrack and estimate the CUs consumed when an exception occurs. More details on this mechanism can be found [here](https://github.com/solana-labs/rbpf/blob/57139e9e1fca4f01155f7d99bc55cdcc25b0bc04/src/jit.rs#L267).

Check failure on line 34 in proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md

View workflow job for this annotation

GitHub Actions / Markdown Linter

Line length [Expected: 80; Actual: 564]

proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md:34:81 MD013/line-length Line length [Expected: 80; Actual: 564]

While this approach is effective, it introduces additional work and complexity. These mechanisms are often implementation-specific, and requiring all clients to track the exact number of executed BPF instructions for consensus is costly and unnecessary. Such precision is not essential for protocol-level consensus, especially since these cases are rare.

Check failure on line 36 in proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md

View workflow job for this annotation

GitHub Actions / Markdown Linter

Line length [Expected: 80; Actual: 354]

proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md:36:81 MD013/line-length Line length [Expected: 80; Actual: 354]

### Clarified Protocol Behavior:

Instead of mandating implementation-specific work to handle exceptions, we propose the following clarification in the protocol:

Check failure on line 40 in proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md

View workflow job for this annotation

GitHub Actions / Markdown Linter

Line length [Expected: 80; Actual: 127]

proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md:40:81 MD013/line-length Line length [Expected: 80; Actual: 127]

- For successful execution of a Basic Block (i.e., the block exits at the last BPF instruction), the deterministic CU cost of the block will be charged to the transaction’s CU meter. This ensures that CU consumption for successful transactions is accurately accounted for.

Check failure on line 42 in proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md

View workflow job for this annotation

GitHub Actions / Markdown Linter

Line length [Expected: 80; Actual: 272]

proposals/simd-XXXX-vm-consume-budget-for-percise-failure.md:42:81 MD013/line-length Line length [Expected: 80; Actual: 272]
- In the event of an exception during Basic Block execution, where the block does not exit normally, the requested CUs for the transaction will be charged to the CU meter. This allows for a simple and efficient fallback mechanism that avoids the need for tracking the exact number of executed instructions up to the point of failure.

By adopting this approach, the protocol avoids the overhead of requiring precise instruction-level CU tracking for transactions that fail. Instead, the requested CU limit of the transaction will be used, simplifying the handling of failed transactions while still maintaining consensus.

### Conclusion:

This proposal enhances performance and simplifies CU tracking by formalizing the use of Basic Blocks for efficient execution. It eliminates the need for costly, implementation-specific work to track CU consumption during execution failures, providing a clear and consistent approach to handling exceptions. This change allows clients to maintain consensus without sacrificing performance, ensuring that the protocol remains both efficient and robust.

## Alternatives Considered

None

## New Terminology

- [Basic Block](https://en.wikipedia.org/wiki/Basic_block):i In the context of JIT execution and BPF processing, a Basic Block is a sequence of BPF instructions that forms a single, linear flow of control with no loops or conditional branches except for the entry and exit points. It represents a segment of code where execution starts at the first instruction and proceeds sequentially through to the last instruction without deviation. The Basic Block is characterized by its predictable execution path, allowing for efficient budget checks and optimizations, as its Compute Unit (CU) cost can be determined before execution and verified at the end of the block.

## Detailed Design

At banking stage [here](https://github.com/anza-xyz/agave/blob/master/core/src/banking_stage/committer.rs#L99) and replay stage [here](https://github.com/anza-xyz/agave/blob/master/ledger/src/blockstore_processor.rs#L239) where Transaction's executed_units is checked, implement new logic:
```
let execution_cu = match transaction.execution_results {
Ok(_) || Err(TransactionError::CustomError(_)) => committed_tx.executed_cu,
_ => transaction.requested_cu,
};
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too familiar with InstructionError, wondering if correct to use Requested_cu for any error but TransactionError::CustomError(_)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most important part to get right. In Agave's VM, I think it's

EbpfError::DivideByZero
EbpfError::DivideOverflow
EbpfError::CallOutsideTextSegment
EbpfError::InvalidInstruction
EbpfError::InvalidVirtualAddress

I'm not sure exactly what these translate to as TransactionErrors though.
In firedancer, it's maybe something like

#define FD_VM_ERR_SIGSPLIT    ( -9) /* split multiword instruction (e.g. jump into the middle of a multiword instruction) */
#define FD_VM_ERR_SIGILL      (-12) /* illegal instruction (e.g. opcode is not valid) */
#define FD_VM_ERR_SIGSEGV     (-13) /* illegal memory address (e.g. read/write to an address not backed by any memory) */
#define FD_VM_ERR_SIGBUS      (-14) /* misaligned memory address (e.g. read/write to an address with inappropriate alignment) */
#define FD_VM_ERR_SIGRDONLY   (-15) /* illegal write (e.g. write to a read only address) */
#define FD_VM_ERR_SIGFPE      (-18) /* divide by zero */

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, what about AccessViolation in agave's VM, should it also be considered as "irregular failure"?

I failed to find where EbpfError converted to InstructionError, there are some casting at bpf_loader, is it the right place to look at @Lichtso ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... ...
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, instead of charging different CUs based on TransactionError at call sites, we could make VM to consume CUs differently - deplete CU meter in case of "irregular failure"; so the call sites (banking stage or replay) don't need to change. I thought this is what @Lichtso implied somewhere.

```

## Impact

None

## Security Considerations

One potential issue with using requested CUs in the case of failed transactions is the risk of transactions with grossly large CU requests consuming an excessive portion of the block's CU limit. This could effectively cause a denial-of-service effect by preventing legitimate transactions from being included in the block. To mitigate this risk, it is recommended that this proposal be implemented after SIMD-172 is deployed, which removes the possibility of accidentally requesting an excessively large number of CUs.

By ensuring that CU requests are reasonable and controlled, the risk of failed transactions taking up disproportionate block space will be minimized, allowing the proposed solution to work effectively without compromising block utilization.
Loading