From f9fa54f85549fa8bc6faac24d7de32c5239ef874 Mon Sep 17 00:00:00 2001 From: Lucas Steuernagel Date: Mon, 19 Aug 2024 13:42:23 -0300 Subject: [PATCH 1/7] Write dynamic stack frames proposal --- proposals/0xxx-dynamic-stack-frames.md | 78 ++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) create mode 100644 proposals/0xxx-dynamic-stack-frames.md diff --git a/proposals/0xxx-dynamic-stack-frames.md b/proposals/0xxx-dynamic-stack-frames.md new file mode 100644 index 00000000..e2765637 --- /dev/null +++ b/proposals/0xxx-dynamic-stack-frames.md @@ -0,0 +1,78 @@ +--- +simd: '0XXX' +title: Dynamic stack frames in SBF +authors: + - Alexander Meißner + - Alessandro Decina + - Lucas Steuernagel +category: Standard +type: Core +status: Draft +created: 2024-08-19T00:00:00.000Z +feature: null +supersedes: null +superseded-by: null +extends: null +--- + +## Summary + +The SVM currently allocates a fixed amount of stack space to each function frame. We propose allowing programs to dynamically manage their stack space through the introduction of an explicit stack pointer register. + +## Motivation + +The SVM allocates a fixed amount of memory to hold a program’s stack. Within the stack region, the virtual machine reserves 4096 bytes of stack space for each function frame. This is simultaneously limiting for functions that require more space, and wasteful for functions that require less space. + +For well optimized programs that don’t allocate large amounts of stack, the virtual machine currently still reserves 4096 bytes of stack for each function call, leading to suboptimal memory usage, which may cause unnecessary page faults. + +On the other hand, some programs are known to create large function frames - this seems common with programs that serialize a lot of data - and they have to jump through hoops to avoid overflowing the stack. The virtual machine detects when a stack overflow occurs, and it does so by implementing a stack frame gaps system whereby it inserts a virtual sentinel frame following a valid function frame. If the sentinel frame is accessed, the executing program is aborted. This system is fragile and is incompatible with direct mapping - a feature we expect to enable soon. + +The changes proposed in this document would allow us to optimize stack memory usage and remove the fragile stack frame gaps system. Note that we do not propose to remove the existing maximum stack space limit: stack space stays unchanged, what changes is how it is partitioned internally. + +## Detailed design + +Bringing dynamic stack frames to the Solana Bytecode Format and its corresponding virtual machine entails changes in several aspects of the execution environment. + +### SBF architecture modifications + + +We will introduce a new register R11 in the virtual machine, which is going to hold the stack pointer. The program can only write to such a register and modify it through the `add64 reg, imm` instruction. The verifier will enforce these constraints on deployed programs. For further information about the changes in the ISA, refer to the [SPF spec document](https://github.com/solana-labs/rbpf/blob/main/doc/bytecode.md). + +The R11 register will work in tandem with the R10 (frame pointer) register. The former is write-only to the program, and the latter is read-only to the program, forming a common design pattern in hardware engineering. More details of this usage are in the following section. + +### Changes in the execution environment + +The R10 register will continue to hold the frame pointer, but we will manage it differently. With fixed frames, when there is a function call we add 4096 to R10 and subtract it when the function returns. In the new scheme, we will assign the value of R11 to R10 at function calls, and save R10’s former value so that we can restore it when the function returns. + +The introduction of dynamic stack frames will change the direction of stack growth. Presently, we stack frames on top of each other, but the memory usage in them grows downward. In the new frame setting, both the placement of new frames and the memory usage inside frames will be downward. + +The stack frame gaps feature, which creates a memory layout where frames are interleaved with equally sized gaps, are not compatible with dynamic stack frames and will be deactivated. + +### Changes in code generation + +In the compiler side, dynamic stack frames allow for some optimizations. First, when a function does not need any stack allocated variable, code generation will not create any instruction to modify R11. In addition, we can stop using R5 as a stack spill register when a function call receives more than five arguments. With dynamic stack frames, the compiler will use registers R1 to R5 for the first five arguments and place remainder arguments in the callee frame, instead of placing them in the caller’s frame. This new call convention obviates the need to use R5 for retrieving the caller’s frame pointer address to access those parameters. + +### Identification of programs + +As per the description in SIMD-0161, programs compiled with dynamic stack frames will contain the XX flag on their ELF header `e_flags` field. + +## Impact + +We foresee a positive impact in smart contract development. Developers won’t need to worry about exceeding the maximum frame space allowed for a function and won’t face any case of stack access violation if their code follows conventional Rust safety rules. Likewise, when we update the Rust version of our platform tools, developers will not have the burden of modifying their contract just because the newer version is using more stack than the previous one, often reaching the 4096 bytes limit. Refer to issues [#1186](https://github.com/anza-xyz/agave/issues/1186) and [#1158](https://github.com/anza-xyz/agave/issues/1158). + +We also expect some improvements in program execution. For functions with no stack usage, we will not emit the additional instruction that modifies R11, saving some execution time. Furthermore, for function calls that handle more than five arguments, there will be one less store and one less load operation due to the new call convention. + + +## Security considerations + +Stack gaps will be disabled for dynamic stack frames to work. Stack gaps could detect invalid accesses between two function frames, if the accessed address would fall between them. With dynamic stack frames, all stack access will be valid, provided that their address is within the allowed range. We already allow functions to read and modify the memory inside the frame of other functions, so removing the stack gaps should not bring any security implications. + +Although one can change R11 to any value that fits in a 64-bit integer, every memory access is verified, so there is no risk of invalid accesses from a corrupt register. + +## Drawbacks + +Programs will consume more compute units, as most functions will include two extra instructions: one to increment the stack pointer and another one to decrement it. + +## Alternatives considered + +To cope with the SBF limitation of 4096 bytes for the frame size, we could have increased such a number. Even though this would solve the original problem, it would supply an unnecessary amount of memory to functions even when they do not need them. In addition, such a solution would increase pressure on the total memory available for the call stack. Either we would need to increase the total allocation for the virtual machine or decrease the maximum call depth. From c02b8b6a106d7818a76ff73754ccca8da05a25d6 Mon Sep 17 00:00:00 2001 From: Lucas Steuernagel Date: Mon, 19 Aug 2024 13:54:58 -0300 Subject: [PATCH 2/7] Reformat and rename file --- proposals/0166-dynamic-stack-frames.md | 154 +++++++++++++++++++++++++ proposals/0xxx-dynamic-stack-frames.md | 78 ------------- 2 files changed, 154 insertions(+), 78 deletions(-) create mode 100644 proposals/0166-dynamic-stack-frames.md delete mode 100644 proposals/0xxx-dynamic-stack-frames.md diff --git a/proposals/0166-dynamic-stack-frames.md b/proposals/0166-dynamic-stack-frames.md new file mode 100644 index 00000000..d1f19875 --- /dev/null +++ b/proposals/0166-dynamic-stack-frames.md @@ -0,0 +1,154 @@ +--- +simd: '0166' +title: Dynamic stack frames in SBF +authors: + - Alexander Meißner + - Alessandro Decina + - Lucas Steuernagel +category: Standard +type: Core +status: Draft +created: 2024-08-19T00:00:00.000Z +feature: null +supersedes: null +superseded-by: null +extends: null +--- + +## Summary + +The SVM currently allocates a fixed amount of stack space to each function +frame. We propose allowing programs to dynamically manage their stack space +through the introduction of an explicit stack pointer register. + +## Motivation + +The SVM allocates a fixed amount of memory to hold a program’s stack. Within +the stack region, the virtual machine reserves 4096 bytes of stack space for +each function frame. This is simultaneously limiting for functions that +require more space, and wasteful for functions that require less space. + +For well optimized programs that don’t allocate large amounts of stack, the +virtual machine currently still reserves 4096 bytes of stack for each +function call, leading to suboptimal memory usage, which may cause +unnecessary page faults. + +On the other hand, some programs are known to create large function frames - +this seems common with programs that serialize a lot of data - and they have +to jump through hoops to avoid overflowing the stack. The virtual machine +detects when a stack overflow occurs, and it does so by implementing a stack +frame gaps system whereby it inserts a virtual sentinel frame following a +valid function frame. If the sentinel frame is accessed, the executing program +is aborted. This system is fragile and is incompatible with direct mapping - +a feature we expect to enable soon. + +The changes proposed in this document would allow us to optimize stack memory +usage and remove the fragile stack frame gaps system. Note that we do not +propose to remove the existing maximum stack space limit: stack space stays +unchanged, what changes is how it is partitioned internally. + +## Alternatives Considered + +To cope with the SBF limitation of 4096 bytes for the frame size, we could +have increased such a number. Even though this would solve the original +problem, it would supply an unnecessary amount of memory to functions even +when they do not need them. In addition, such a solution would increase +pressure on the total memory available for the call stack. Either we would +need to increase the total allocation for the virtual machine or decrease the +maximum call depth. + +## New Terminology + +None. + +## Detailed Design + +Bringing dynamic stack frames to the Solana Bytecode Format and its +corresponding virtual machine entails changes in several aspects of the +execution environment. + +### SBF architecture modifications + + +We will introduce a new register R11 in the virtual machine, which is going +to hold the stack pointer. The program can only write to such a register and +modify it through the `add64 reg, imm` instruction. The verifier will enforce +these constraints on deployed programs. For further information about the +changes in the ISA, refer to the [SPF spec document](https://github.com/solana-labs/rbpf/blob/main/doc/bytecode.md). + +The R11 register will work in tandem with the R10 (frame pointer) register. +The former is write-only to the program, and the latter is read-only to the +program, forming a common design pattern in hardware engineering. More +details of this usage are in the following section. + +### Changes in the execution environment + +The R10 register will continue to hold the frame pointer, but we will manage +it differently. With fixed frames, when there is a function call we add 4096 +to R10 and subtract it when the function returns. In the new scheme, we will +assign the value of R11 to R10 at function calls, and save R10’s former value +so that we can restore it when the function returns. + +The introduction of dynamic stack frames will change the direction of stack +growth. Presently, we stack frames on top of each other, but the memory usage +in them grows downward. In the new frame setting, both the placement of new +frames and the memory usage inside frames will be downward. + +The stack frame gaps feature, which creates a memory layout where frames are +interleaved with equally sized gaps, are not compatible with dynamic stack +frames and will be deactivated. + +### Changes in code generation + +In the compiler side, dynamic stack frames allow for some optimizations. +First, when a function does not need any stack allocated variable, code +generation will not create any instruction to modify R11. In addition, we +can stop using R5 as a stack spill register when a function call receives +more than five arguments. With dynamic stack frames, the compiler will use +registers R1 to R5 for the first five arguments and place remainder arguments +in the callee frame, instead of placing them in the caller’s frame. This new +call convention obviates the need to use R5 for retrieving the caller’s frame +pointer address to access those parameters. + +### Identification of programs + +As per the description in SIMD-0161, programs compiled with dynamic stack +frames will contain the XX flag on their ELF header `e_flags` field. + +## Impact + +We foresee a positive impact in smart contract development. Developers won’t +need to worry about exceeding the maximum frame space allowed for a function +and won’t face any case of stack access violation if their code follows +conventional Rust safety rules. Likewise, when we update the Rust version of +our platform tools, developers will not have the burden of modifying their +contract just because the newer version is using more stack than the previous +one, often reaching the 4096 bytes limit. Refer to issues +[#1186](https://github.com/anza-xyz/agave/issues/1186) and +[#1158](https://github.com/anza-xyz/agave/issues/1158). + +We also expect some improvements in program execution. For functions with no +stack usage, we will not emit the additional instruction that modifies R11, +saving some execution time. Furthermore, for function calls that handle more +than five arguments, there will be one less store and one less load operation +due to the new call convention. + +## Security Considerations + +Stack gaps will be disabled for dynamic stack frames to work. Stack gaps could +detect invalid accesses between two function frames, if the accessed address +would fall between them. With dynamic stack frames, all stack access will be +valid, provided that their address is within the allowed range. We already +allow functions to read and modify the memory inside the frame of other +functions, so removing the stack gaps should not bring any security +implications. + +Although one can change R11 to any value that fits in a 64-bit integer, every +memory access is verified, so there is no risk of invalid accesses from a +corrupt register. + +## Drawbacks + +Programs will consume more compute units, as most functions will include two +extra instructions: one to increment the stack pointer and another one to +decrement it. diff --git a/proposals/0xxx-dynamic-stack-frames.md b/proposals/0xxx-dynamic-stack-frames.md deleted file mode 100644 index e2765637..00000000 --- a/proposals/0xxx-dynamic-stack-frames.md +++ /dev/null @@ -1,78 +0,0 @@ ---- -simd: '0XXX' -title: Dynamic stack frames in SBF -authors: - - Alexander Meißner - - Alessandro Decina - - Lucas Steuernagel -category: Standard -type: Core -status: Draft -created: 2024-08-19T00:00:00.000Z -feature: null -supersedes: null -superseded-by: null -extends: null ---- - -## Summary - -The SVM currently allocates a fixed amount of stack space to each function frame. We propose allowing programs to dynamically manage their stack space through the introduction of an explicit stack pointer register. - -## Motivation - -The SVM allocates a fixed amount of memory to hold a program’s stack. Within the stack region, the virtual machine reserves 4096 bytes of stack space for each function frame. This is simultaneously limiting for functions that require more space, and wasteful for functions that require less space. - -For well optimized programs that don’t allocate large amounts of stack, the virtual machine currently still reserves 4096 bytes of stack for each function call, leading to suboptimal memory usage, which may cause unnecessary page faults. - -On the other hand, some programs are known to create large function frames - this seems common with programs that serialize a lot of data - and they have to jump through hoops to avoid overflowing the stack. The virtual machine detects when a stack overflow occurs, and it does so by implementing a stack frame gaps system whereby it inserts a virtual sentinel frame following a valid function frame. If the sentinel frame is accessed, the executing program is aborted. This system is fragile and is incompatible with direct mapping - a feature we expect to enable soon. - -The changes proposed in this document would allow us to optimize stack memory usage and remove the fragile stack frame gaps system. Note that we do not propose to remove the existing maximum stack space limit: stack space stays unchanged, what changes is how it is partitioned internally. - -## Detailed design - -Bringing dynamic stack frames to the Solana Bytecode Format and its corresponding virtual machine entails changes in several aspects of the execution environment. - -### SBF architecture modifications - - -We will introduce a new register R11 in the virtual machine, which is going to hold the stack pointer. The program can only write to such a register and modify it through the `add64 reg, imm` instruction. The verifier will enforce these constraints on deployed programs. For further information about the changes in the ISA, refer to the [SPF spec document](https://github.com/solana-labs/rbpf/blob/main/doc/bytecode.md). - -The R11 register will work in tandem with the R10 (frame pointer) register. The former is write-only to the program, and the latter is read-only to the program, forming a common design pattern in hardware engineering. More details of this usage are in the following section. - -### Changes in the execution environment - -The R10 register will continue to hold the frame pointer, but we will manage it differently. With fixed frames, when there is a function call we add 4096 to R10 and subtract it when the function returns. In the new scheme, we will assign the value of R11 to R10 at function calls, and save R10’s former value so that we can restore it when the function returns. - -The introduction of dynamic stack frames will change the direction of stack growth. Presently, we stack frames on top of each other, but the memory usage in them grows downward. In the new frame setting, both the placement of new frames and the memory usage inside frames will be downward. - -The stack frame gaps feature, which creates a memory layout where frames are interleaved with equally sized gaps, are not compatible with dynamic stack frames and will be deactivated. - -### Changes in code generation - -In the compiler side, dynamic stack frames allow for some optimizations. First, when a function does not need any stack allocated variable, code generation will not create any instruction to modify R11. In addition, we can stop using R5 as a stack spill register when a function call receives more than five arguments. With dynamic stack frames, the compiler will use registers R1 to R5 for the first five arguments and place remainder arguments in the callee frame, instead of placing them in the caller’s frame. This new call convention obviates the need to use R5 for retrieving the caller’s frame pointer address to access those parameters. - -### Identification of programs - -As per the description in SIMD-0161, programs compiled with dynamic stack frames will contain the XX flag on their ELF header `e_flags` field. - -## Impact - -We foresee a positive impact in smart contract development. Developers won’t need to worry about exceeding the maximum frame space allowed for a function and won’t face any case of stack access violation if their code follows conventional Rust safety rules. Likewise, when we update the Rust version of our platform tools, developers will not have the burden of modifying their contract just because the newer version is using more stack than the previous one, often reaching the 4096 bytes limit. Refer to issues [#1186](https://github.com/anza-xyz/agave/issues/1186) and [#1158](https://github.com/anza-xyz/agave/issues/1158). - -We also expect some improvements in program execution. For functions with no stack usage, we will not emit the additional instruction that modifies R11, saving some execution time. Furthermore, for function calls that handle more than five arguments, there will be one less store and one less load operation due to the new call convention. - - -## Security considerations - -Stack gaps will be disabled for dynamic stack frames to work. Stack gaps could detect invalid accesses between two function frames, if the accessed address would fall between them. With dynamic stack frames, all stack access will be valid, provided that their address is within the allowed range. We already allow functions to read and modify the memory inside the frame of other functions, so removing the stack gaps should not bring any security implications. - -Although one can change R11 to any value that fits in a 64-bit integer, every memory access is verified, so there is no risk of invalid accesses from a corrupt register. - -## Drawbacks - -Programs will consume more compute units, as most functions will include two extra instructions: one to increment the stack pointer and another one to decrement it. - -## Alternatives considered - -To cope with the SBF limitation of 4096 bytes for the frame size, we could have increased such a number. Even though this would solve the original problem, it would supply an unnecessary amount of memory to functions even when they do not need them. In addition, such a solution would increase pressure on the total memory available for the call stack. Either we would need to increase the total allocation for the virtual machine or decrease the maximum call depth. From 8767c6811845bd62a854935ed296902c6be7a91b Mon Sep 17 00:00:00 2001 From: Lucas Date: Mon, 9 Sep 2024 11:45:24 -0300 Subject: [PATCH 3/7] Use must wording instead of will --- proposals/0166-dynamic-stack-frames.md | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/proposals/0166-dynamic-stack-frames.md b/proposals/0166-dynamic-stack-frames.md index d1f19875..6643e55a 100644 --- a/proposals/0166-dynamic-stack-frames.md +++ b/proposals/0166-dynamic-stack-frames.md @@ -71,40 +71,41 @@ execution environment. We will introduce a new register R11 in the virtual machine, which is going -to hold the stack pointer. The program can only write to such a register and -modify it through the `add64 reg, imm` instruction. The verifier will enforce -these constraints on deployed programs. For further information about the -changes in the ISA, refer to the [SPF spec document](https://github.com/solana-labs/rbpf/blob/main/doc/bytecode.md). +to hold the stack pointer. The program must only write to such a register and +modify it through the `add64 reg, imm` (op code `0x07`) instruction. The +verifier must enforce these constraints on deployed programs. For further +information about the changes in the ISA, refer to the +[SPF spec document](https://github.com/solana-labs/rbpf/blob/main/doc/bytecode.md). -The R11 register will work in tandem with the R10 (frame pointer) register. +The R11 register must work in tandem with the R10 (frame pointer) register. The former is write-only to the program, and the latter is read-only to the program, forming a common design pattern in hardware engineering. More details of this usage are in the following section. ### Changes in the execution environment -The R10 register will continue to hold the frame pointer, but we will manage +The R10 register must continue to hold the frame pointer, but we will manage it differently. With fixed frames, when there is a function call we add 4096 -to R10 and subtract it when the function returns. In the new scheme, we will +to R10 and subtract it when the function returns. In the new scheme, we must assign the value of R11 to R10 at function calls, and save R10’s former value so that we can restore it when the function returns. The introduction of dynamic stack frames will change the direction of stack growth. Presently, we stack frames on top of each other, but the memory usage in them grows downward. In the new frame setting, both the placement of new -frames and the memory usage inside frames will be downward. +frames and the memory usage inside frames must be downward. The stack frame gaps feature, which creates a memory layout where frames are interleaved with equally sized gaps, are not compatible with dynamic stack -frames and will be deactivated. +frames and must be deactivated. ### Changes in code generation In the compiler side, dynamic stack frames allow for some optimizations. First, when a function does not need any stack allocated variable, code -generation will not create any instruction to modify R11. In addition, we +generation must not create any instruction to modify R11. In addition, we can stop using R5 as a stack spill register when a function call receives -more than five arguments. With dynamic stack frames, the compiler will use +more than five arguments. With dynamic stack frames, the compiler must use registers R1 to R5 for the first five arguments and place remainder arguments in the callee frame, instead of placing them in the caller’s frame. This new call convention obviates the need to use R5 for retrieving the caller’s frame @@ -113,7 +114,7 @@ pointer address to access those parameters. ### Identification of programs As per the description in SIMD-0161, programs compiled with dynamic stack -frames will contain the XX flag on their ELF header `e_flags` field. +frames must contain the XX flag on their ELF header `e_flags` field. ## Impact From aa4ee271a07df5c5fb9117719244729a59d300fa Mon Sep 17 00:00:00 2001 From: Lucas Date: Wed, 2 Oct 2024 15:10:14 -0300 Subject: [PATCH 4/7] Change title --- proposals/0166-dynamic-stack-frames.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0166-dynamic-stack-frames.md b/proposals/0166-dynamic-stack-frames.md index 6643e55a..b3cc1533 100644 --- a/proposals/0166-dynamic-stack-frames.md +++ b/proposals/0166-dynamic-stack-frames.md @@ -1,6 +1,6 @@ --- simd: '0166' -title: Dynamic stack frames in SBF +title: SBPF Dynamic stack frames authors: - Alexander Meißner - Alessandro Decina From c5fba114ae4081ad39080f3c6bb62ee7e4dea9f3 Mon Sep 17 00:00:00 2001 From: Lucas Date: Tue, 5 Nov 2024 12:07:23 -0300 Subject: [PATCH 5/7] Add section about the verifier --- proposals/0166-dynamic-stack-frames.md | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/proposals/0166-dynamic-stack-frames.md b/proposals/0166-dynamic-stack-frames.md index b3cc1533..7d181e3c 100644 --- a/proposals/0166-dynamic-stack-frames.md +++ b/proposals/0166-dynamic-stack-frames.md @@ -7,7 +7,7 @@ authors: - Lucas Steuernagel category: Standard type: Core -status: Draft +status: Review created: 2024-08-19T00:00:00.000Z feature: null supersedes: null @@ -111,6 +111,24 @@ in the callee frame, instead of placing them in the caller’s frame. This new call convention obviates the need to use R5 for retrieving the caller’s frame pointer address to access those parameters. +### Changes in the verifier + +The additional constraints for the verifier only encompass new rules for the +introduction of the R11 register. It can only appear coupled with a `add64` +(opcode `0x07`) instruction. If that is not the case, the verifier must throw +a `VerifierError::InvalidSourceRegister` when it is the source register of an +instruction and `VerifierError::InvalidDestinationRegister` when it is the +destination register of an instruction. Additionally, it must not be the +target register for the `callx` (opcode 0x9D) instructions, otherwise a +`VerifierError::InvalidRegister` must be raised. + +The verification rules for the R10 register must remain unchanged. It must not +be the target register for the `callx` (opcode 0x9D) instructions, otherwise a +`VerifierError::InvalidRegister` must be raised. It must also not be the +destination register of non-store instructions, otherwise a +`VerifierError::CannotWriteToR10` must be thrown. + + ### Identification of programs As per the description in SIMD-0161, programs compiled with dynamic stack From fe00d5177d9e235c566b83db633640556d3854b4 Mon Sep 17 00:00:00 2001 From: Lucas Date: Mon, 2 Dec 2024 17:36:57 -0300 Subject: [PATCH 6/7] Update SIMD with R10 as stack pointer --- proposals/0166-dynamic-stack-frames.md | 131 ++++++++++++------------- 1 file changed, 62 insertions(+), 69 deletions(-) diff --git a/proposals/0166-dynamic-stack-frames.md b/proposals/0166-dynamic-stack-frames.md index 7d181e3c..44cbedde 100644 --- a/proposals/0166-dynamic-stack-frames.md +++ b/proposals/0166-dynamic-stack-frames.md @@ -19,7 +19,7 @@ extends: null The SVM currently allocates a fixed amount of stack space to each function frame. We propose allowing programs to dynamically manage their stack space -through the introduction of an explicit stack pointer register. +through the introduction of an explicit stack pointer. ## Motivation @@ -29,9 +29,9 @@ each function frame. This is simultaneously limiting for functions that require more space, and wasteful for functions that require less space. For well optimized programs that don’t allocate large amounts of stack, the -virtual machine currently still reserves 4096 bytes of stack for each -function call, leading to suboptimal memory usage, which may cause -unnecessary page faults. +virtual machine currently still reserves 4096 bytes of stack for each function +call, leading to suboptimal memory usage, which may cause unnecessary page +faults. On the other hand, some programs are known to create large function frames - this seems common with programs that serialize a lot of data - and they have @@ -39,8 +39,8 @@ to jump through hoops to avoid overflowing the stack. The virtual machine detects when a stack overflow occurs, and it does so by implementing a stack frame gaps system whereby it inserts a virtual sentinel frame following a valid function frame. If the sentinel frame is accessed, the executing program -is aborted. This system is fragile and is incompatible with direct mapping - -a feature we expect to enable soon. +is aborted. This system is fragile and is incompatible with direct mapping - a +feature we expect to enable soon. The changes proposed in this document would allow us to optimize stack memory usage and remove the fragile stack frame gaps system. Note that we do not @@ -51,11 +51,10 @@ unchanged, what changes is how it is partitioned internally. To cope with the SBF limitation of 4096 bytes for the frame size, we could have increased such a number. Even though this would solve the original -problem, it would supply an unnecessary amount of memory to functions even -when they do not need them. In addition, such a solution would increase -pressure on the total memory available for the call stack. Either we would -need to increase the total allocation for the virtual machine or decrease the -maximum call depth. +problem, it would supply functions with an unnecessary amount of memory. In +addition, such a solution would increase pressure on the total memory +available for the call stack. Either we would need to increase the total +allocation for the virtual machine or decrease the maximum call depth. ## New Terminology @@ -67,72 +66,67 @@ Bringing dynamic stack frames to the Solana Bytecode Format and its corresponding virtual machine entails changes in several aspects of the execution environment. -### SBF architecture modifications - - -We will introduce a new register R11 in the virtual machine, which is going -to hold the stack pointer. The program must only write to such a register and -modify it through the `add64 reg, imm` (op code `0x07`) instruction. The -verifier must enforce these constraints on deployed programs. For further -information about the changes in the ISA, refer to the -[SPF spec document](https://github.com/solana-labs/rbpf/blob/main/doc/bytecode.md). - -The R11 register must work in tandem with the R10 (frame pointer) register. -The former is write-only to the program, and the latter is read-only to the -program, forming a common design pattern in hardware engineering. More -details of this usage are in the following section. - ### Changes in the execution environment -The R10 register must continue to hold the frame pointer, but we will manage -it differently. With fixed frames, when there is a function call we add 4096 -to R10 and subtract it when the function returns. In the new scheme, we must -assign the value of R11 to R10 at function calls, and save R10’s former value -so that we can restore it when the function returns. +We will repurpose the existing R10 register from a frame pointer to a stack +pointer. In other words, it must stop representing the highest address +accessible in a frame, and must now point to the lowest address in a frame. -The introduction of dynamic stack frames will change the direction of stack -growth. Presently, we stack frames on top of each other, but the memory usage -in them grows downward. In the new frame setting, both the placement of new -frames and the memory usage inside frames must be downward. +Such a change entails a change in the direction of stack growth. Presently, we +stack frames on top of each other, but the memory usage within them grows +downward. In the new frame setting, both the placement of new frames and the +memory usage inside frames must be downward. + +Functions in SBF must alter the stack pointer using the `add64 reg, imm` +(opcode `0x07`) instruction only, allowing them to request any desirable +amount of stack space, provided that it meets the required alignment (refer to +the following section). The stack frame gaps feature, which creates a memory layout where frames are interleaved with equally sized gaps, are not compatible with dynamic stack frames and must be deactivated. -### Changes in code generation +### Stack alignment -In the compiler side, dynamic stack frames allow for some optimizations. -First, when a function does not need any stack allocated variable, code -generation must not create any instruction to modify R11. In addition, we -can stop using R5 as a stack spill register when a function call receives -more than five arguments. With dynamic stack frames, the compiler must use -registers R1 to R5 for the first five arguments and place remainder arguments -in the callee frame, instead of placing them in the caller’s frame. This new -call convention obviates the need to use R5 for retrieving the caller’s frame -pointer address to access those parameters. +We want to enforce that the stack pointer remains aligned, therefore R10 must +only be incremented or decremented by a multiple of 64. Large alignments might +seem wasteful, but enforcing a sufficiently big alignment will spark +innovation in interpreters and JITs, ultimately leading to much better +performance and thus lower costs. + +Based on the current AVX-512 instructions available on Intel and AMD +processors, the stack alignment must be 64 bytes. Even if current interpreters +do not take advantage of these vectorized instructions, we believe that future +generation interpreters might be able to vectorize SBF programs to speed up +common operations, such as copying or comparing public keys and signatures. +An unaligned stack prohibits such innovations. ### Changes in the verifier -The additional constraints for the verifier only encompass new rules for the -introduction of the R11 register. It can only appear coupled with a `add64` -(opcode `0x07`) instruction. If that is not the case, the verifier must throw -a `VerifierError::InvalidSourceRegister` when it is the source register of an -instruction and `VerifierError::InvalidDestinationRegister` when it is the -destination register of an instruction. Additionally, it must not be the -target register for the `callx` (opcode 0x9D) instructions, otherwise a -`VerifierError::InvalidRegister` must be raised. +The verifier must now allow R10 to be the destination register of the +`add64 reg, imm` (opcode `0x07`) instruction. -The verification rules for the R10 register must remain unchanged. It must not -be the target register for the `callx` (opcode 0x9D) instructions, otherwise a -`VerifierError::InvalidRegister` must be raised. It must also not be the -destination register of non-store instructions, otherwise a -`VerifierError::CannotWriteToR10` must be thrown. +The verifier must throw `VerifierError::UnalignedImmediate` when the immediate +value of `add64 reg, imm` (opcode `0x07`) is not a multiple of 64 and the +destination register is R10. The error must only be raised when both +conditions happen simultaneously. + +### Changes in code generation +In the compiler side, dynamic stack frames allow for some optimizations. +First, when a function does not need any stack allocated variable, code +generation must not create any instruction to modify R10. In addition, we can +stop using R5 as a stack spill register when a function call receives more +than five arguments. With dynamic stack frames, the compiler must use +registers R1 to R5 for the first five arguments and place remainder arguments +in the caller frame, easily retrieving them in the callee as an offset from +the stack pointer. This new call convention obviates the need to use R5 for +retrieving the caller’s frame pointer address to access those parameters. ### Identification of programs As per the description in SIMD-0161, programs compiled with dynamic stack -frames must contain the XX flag on their ELF header `e_flags` field. +frames must contain the `0x02` flag on their ELF header `e_flags` field. ## Impact @@ -147,10 +141,9 @@ one, often reaching the 4096 bytes limit. Refer to issues [#1158](https://github.com/anza-xyz/agave/issues/1158). We also expect some improvements in program execution. For functions with no -stack usage, we will not emit the additional instruction that modifies R11, -saving some execution time. Furthermore, for function calls that handle more -than five arguments, there will be one less store and one less load operation -due to the new call convention. +stack usage, we will not emit the additional instruction that modifies R10. +Furthermore, for function calls that handle more than five arguments, there +will be one less store and one less load operation due to the new call convention. ## Security Considerations @@ -162,12 +155,12 @@ allow functions to read and modify the memory inside the frame of other functions, so removing the stack gaps should not bring any security implications. -Although one can change R11 to any value that fits in a 64-bit integer, every -memory access is verified, so there is no risk of invalid accesses from a -corrupt register. +Although one can change R10 to almost any value that fits in a 64-bit integer +with `add64 reg, imm`, every memory access is verified, so there is no risk of +invalid accesses from a corrupt register. ## Drawbacks -Programs will consume more compute units, as most functions will include two -extra instructions: one to increment the stack pointer and another one to -decrement it. +Programs will consume negligibly more compute units, as most functions will +include two extra instructions: one to increment the stack pointer and another +one to decrement it. From 06c3e5c09b9b0fafb96349afad64ab67452ff5f0 Mon Sep 17 00:00:00 2001 From: Lucas Date: Mon, 16 Dec 2024 17:57:17 -0300 Subject: [PATCH 7/7] Update header flag --- proposals/0166-dynamic-stack-frames.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0166-dynamic-stack-frames.md b/proposals/0166-dynamic-stack-frames.md index 44cbedde..b22fce6b 100644 --- a/proposals/0166-dynamic-stack-frames.md +++ b/proposals/0166-dynamic-stack-frames.md @@ -126,7 +126,7 @@ retrieving the caller’s frame pointer address to access those parameters. ### Identification of programs As per the description in SIMD-0161, programs compiled with dynamic stack -frames must contain the `0x02` flag on their ELF header `e_flags` field. +frames must contain the `0x01` flag on their ELF header `e_flags` field. ## Impact