Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make state value metadata upgradable at runtime #11333

Merged
merged 2 commits into from
Dec 15, 2023

Conversation

msmouse
Copy link
Contributor

@msmouse msmouse commented Dec 14, 2023

Description

Remove StateValueMetadataKind=Option<StateValueMetadata>, instead, add StateValueMetadata::is_none() as a in memory representation of a None metadata. This way, we are able to upgrade the metadata format at runtime (when we charge and refund the storage fee). This is to prepare for the refundable permanent bytes.

Magic is put in place for StateValue and WriteOp, so they always carry a metadata in the latest format and serialize into the old formats if necessary.

Test Plan

Copy link

trunk-io bot commented Dec 14, 2023

⏱️ 45h 6m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-unit-tests 4h 31m 🟩🟩🟩 (+7 more)
rust-unit-coverage 4h 9m 🟩
windows-build 3h 15m 🟩🟩🟩🟩🟩 (+7 more)
rust-smoke-coverage 3h 12m 🟩
rust-move-unit-coverage 2h 28m 🟩🟩🟩🟥🟩 (+4 more)
forge-framework-upgrade-test / forge 2h 3m 🟥🟥🟥
replay-mainnet / replay-verify (0) 1h 12m 🟥
replay-mainnet / replay-verify (10) 1h 11m
execution-performance / single-node-performance 1h 11m 🟩🟩🟩🟩
replay-mainnet / replay-verify (11) 1h 10m
replay-mainnet / replay-verify (12) 1h 10m
replay-mainnet / replay-verify (15) 1h 10m
replay-mainnet / replay-verify (13) 1h 10m
replay-mainnet / replay-verify (16) 1h 10m
replay-mainnet / replay-verify (14) 1h 7m
rust-smoke-tests 1h 7m 🟩🟩
replay-mainnet / replay-verify (4) 1h 7m
replay-mainnet / replay-verify (7) 1h 7m
replay-mainnet / replay-verify (8) 1h 7m
rust-lints 1h 7m 🟩🟩🟩 (+7 more)
replay-mainnet / replay-verify (9) 1h 7m
replay-mainnet / replay-verify (5) 54m 🟩
replay-mainnet / replay-verify (6) 53m 🟩
replay-mainnet / replay-verify (1) 50m 🟥
run-tests-main-branch 48m 🟩🟩🟩 (+7 more)
replay-mainnet / replay-verify (3) 46m 🟩
check 43m 🟩🟩🟩 (+7 more)
forge-e2e-test / forge 36m 🟩🟩
general-lints 29m 🟩🟩🟩🟩 (+7 more)
replay-mainnet / replay-verify (2) 26m 🟥
forge-compat-test / forge 26m 🟩🟩
check-dynamic-deps 25m 🟩🟩🟩🟩🟩 (+7 more)
rust-images / rust-all 24m 🟩🟩
cli-e2e-tests / run-cli-tests 18m 🟩🟩
semgrep/ci 4m 🟩🟩🟩🟩🟩 (+7 more)
indexer-grpc-e2e-tests / test-indexer-grpc-docker-compose 3m 🟩🟥
file_change_determinator 2m 🟩🟩🟩🟩🟩 (+7 more)
file_change_determinator 2m 🟩🟩🟩🟩🟩 (+7 more)
node-api-compatibility-tests / node-api-compatibility-tests 2m 🟩🟩
file_change_determinator 43s 🟩🟩🟩🟩
execution-performance / file_change_determinator 43s 🟩🟩🟩🟩
permission-check 42s 🟩🟩🟩🟩🟩 (+7 more)
permission-check 33s 🟩🟩🟩🟩🟩 (+7 more)
execution-performance / parallel-execution-performance 32s 🟩🟩🟩🟩
permission-check 30s 🟩🟩🟩🟩🟩 (+7 more)
execution-performance / sequential-execution-performance 29s 🟩🟩🟩🟩
permission-check 29s 🟩🟩🟩🟩🟩 (+7 more)
determine-test-metadata 14s 🟩🟩
upload-to-codecov 11s 🟩
permission-check 10s 🟩🟩🟩🟩
determine-docker-build-metadata 8s 🟩🟩🟩🟩

🚨 3 jobs on the last run were significantly faster/slower than expected

Job Duration vs 7d avg Delta
forge-framework-upgrade-test / forge 51m 29m +75%
windows-build 24m 17m +39%
run-tests-main-branch 6m 4m +25%

settingsfeedbackdocs ⋅ learn more about trunk.io

@msmouse
Copy link
Contributor Author

msmouse commented Dec 14, 2023

This was referenced Dec 14, 2023
@msmouse msmouse changed the title Automatically upgrade StateValueMetadata Make state value metadata upgradable at runtime Dec 14, 2023
@msmouse msmouse marked this pull request as draft December 14, 2023 00:16
Copy link

codecov bot commented Dec 14, 2023

Codecov Report

Attention: 161 lines in your changes are missing coverage. Please review.

Comparison is base (7b77f43) 68.7% compared to head (f4b8fc2) 68.7%.

Files Patch % Lines
types/src/write_set.rs 47.0% 89 Missing ⚠️
types/src/state_store/state_value.rs 47.6% 67 Missing ⚠️
types/src/account_config/resources/coin_info.rs 0.0% 2 Missing ⚠️
aptos-move/aptos-vm-types/src/abstract_write_op.rs 90.0% 1 Missing ⚠️
...ove/aptos-vm/src/move_vm_ext/write_op_converter.rs 94.1% 1 Missing ⚠️
aptos-move/block-executor/src/captured_reads.rs 0.0% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             main   #11333     +/-   ##
=========================================
- Coverage    68.7%    68.7%   -0.1%     
=========================================
  Files         766      766             
  Lines      177267   177406    +139     
=========================================
+ Hits       121893   121917     +24     
- Misses      55374    55489    +115     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@msmouse msmouse force-pushed the 1211-alden-metadata-upgrade branch 5 times, most recently from 49d7665 to 7113866 Compare December 14, 2023 06:25
@msmouse msmouse marked this pull request as ready for review December 14, 2023 06:25
@msmouse msmouse requested review from igor-aptos and a team December 14, 2023 06:25
@msmouse msmouse added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Dec 14, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

StateValueInner::V0(bytes) => (None, bytes),
StateValueInner::WithMetadata { data, metadata } => (Some(metadata), data),
}
pub fn into_parts(self) -> (StateValueMetadata, Bytes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is typically called unpack?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we call it into_inner() throughout the code :)

Copy link
Contributor Author

@msmouse msmouse Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Just changed to unpack.. :trollface:

We have more unpack and into_inner than into_parts, to be fair, but because this doesn't have an inner, I like unpack more.

This comment has been minimized.

bytes_deposit,
creation_time_usecs,
} = inner;
if bytes_deposit == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's fine here, but for the future, if you add V2, you might want to actually have an enum in memory?

or you intentionally want to convert into newest possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentionally, because I feel better doing Eq when there's not two vairants representing the same thing.

@igor-aptos
Copy link
Contributor

and you'll need to regenerate all golden files after rebase :)

Remove StateValueMetadataKind=Option<StateValueMetadata>, instead, add
StateValueMetadata::is_none() as an indication of a None
metadata. This way, we are able to upgrade the metadata format at
runtime (when we charge and refund the storage fee). This is to prepare
for the refundable permanent bytes.
@msmouse msmouse enabled auto-merge (rebase) December 14, 2023 23:54
@msmouse msmouse enabled auto-merge (squash) December 14, 2023 23:56

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on aptos-node-v1.8.3 ==> f4b8fc2c3747e638e8385fb23a4df7941c500ba1

Compatibility test results for aptos-node-v1.8.3 ==> f4b8fc2c3747e638e8385fb23a4df7941c500ba1 (PR)
1. Check liveness of validators at old version: aptos-node-v1.8.3
compatibility::simple-validator-upgrade::liveness-check : committed: 3609 txn/s, latency: 6744 ms, (p50: 7100 ms, p90: 10200 ms, p99: 11100 ms), latency samples: 176880
2. Upgrading first Validator to new version: f4b8fc2c3747e638e8385fb23a4df7941c500ba1
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1831 txn/s, latency: 15665 ms, (p50: 19500 ms, p90: 22000 ms, p99: 22500 ms), latency samples: 91560
3. Upgrading rest of first batch to new version: f4b8fc2c3747e638e8385fb23a4df7941c500ba1
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1831 txn/s, latency: 15897 ms, (p50: 18700 ms, p90: 22100 ms, p99: 22300 ms), latency samples: 93400
4. upgrading second batch to new version: f4b8fc2c3747e638e8385fb23a4df7941c500ba1
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3382 txn/s, latency: 8674 ms, (p50: 9900 ms, p90: 11700 ms, p99: 14200 ms), latency samples: 145440
5. check swarm health
Compatibility test for aptos-node-v1.8.3 ==> f4b8fc2c3747e638e8385fb23a4df7941c500ba1 passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on f4b8fc2c3747e638e8385fb23a4df7941c500ba1

two traffics test: inner traffic : committed: 7119 txn/s, submitted: 7124 txn/s, expired: 4 txn/s, latency: 4791 ms, (p50: 4500 ms, p90: 5700 ms, p99: 10200 ms), latency samples: 3467045
two traffics test : committed: 100 txn/s, latency: 2254 ms, (p50: 2200 ms, p90: 2700 ms, p99: 4400 ms), latency samples: 2000
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.207, avg: 0.196", "QsPosToProposal: max: 0.158, avg: 0.146", "ConsensusProposalToOrdered: max: 0.584, avg: 0.567", "ConsensusOrderedToCommit: max: 0.523, avg: 0.496", "ConsensusProposalToCommit: max: 1.101, avg: 1.062"]
Max round gap was 2 [limit 4] at version 3530591. Max no progress secs was 4.406858 [limit 10] at version 1687615.
Test Ok

@msmouse msmouse merged commit 9a3b178 into main Dec 15, 2023
101 of 132 checks passed
@msmouse msmouse deleted the 1211-alden-metadata-upgrade branch December 15, 2023 00:30
Copy link
Contributor

❌ Forge suite framework_upgrade failure on aptos-node-v1.8.3 ==> f4b8fc2c3747e638e8385fb23a4df7941c500ba1

Compatibility test results for aptos-node-v1.8.3 ==> f4b8fc2c3747e638e8385fb23a4df7941c500ba1 (PR)
Upgrade the nodes to version: f4b8fc2c3747e638e8385fb23a4df7941c500ba1
Test Failed: API error: Unknown error error sending request for url (http://aptos-node-3-validator.forge-framework-upgrade-pr-11333.svc:8080/v1/estimate_gas_price): error trying to connect: dns error: failed to lookup address information: Name or service not known

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: __libc_start_main
  13: <unknown>
Trailing Log Lines:
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: __libc_start_main
  13: <unknown>


Swarm logs can be found here: See fgi output for more information.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ApiError: namespaces "forge-framework-upgrade-pr-11333" not found: NotFound (ErrorResponse { status: "Failure", message: "namespaces \"forge-framework-upgrade-pr-11333\" not found", reason: "NotFound", code: 404 })

Caused by:
    namespaces "forge-framework-upgrade-pr-11333" not found: NotFound

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>
  15: __libc_start_main
  16: <unknown>', testsuite/forge/src/backend/k8s/swarm.rs:676:18
stack backtrace:
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Debugging output:

@@ -193,6 +197,7 @@ pub struct WriteWithDelayedFieldsOp {
pub struct InPlaceDelayedFieldChangeOp {
pub layout: Arc<MoveTypeLayout>,
pub materialized_size: u64,
pub metadata: StateValueMetadata,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this change - why do InPlaceDelayedFieldOps now carry metadata? @msmouse @igor-aptos

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the size changes, we need to change the deposit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not yet implemented, right? the reason I don't follow having metadata here is because there might be multiple Ops inside the same resource, so which one carries the right metadata? Is the metadata going to be overwritten in the final write-set?
if it is just for validation, then I thought the plan was to just validate the size that was used to compute the deposit and/or any charges?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean multiple ops that get squashed?

each session starts from the output of the previous, so:

we start with "storage_size", and that is deposit in the metadata

then first session modifies size to "size_1", we charge for "size_1 - storage_size", and metadata is updated to size_1

then second session modifies to "size_2", we charge for "size_2 - size_1", and metadata is updated to size_2

squashing just needs to take the most recent metadata?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I was confused about the struct, mis-remembered how it's used.
as per our discussion though, we should probably make the handling uniform now with at least ResourceGroupInPlaceDelayedFieldChangeOp and possibly even GroupWrites (and maybe avoid change to modification thingy we are doing there)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants