Add prune wasm codes proposal #1455

pinosu · 2023-06-16T12:28:46Z

Resolves #999

codecov · 2023-06-16T14:17:00Z

Codecov Report

Merging #1455 (37b872e) into main (1ce86ca) will decrease coverage by 0.02%.
The diff coverage is 54.62%.

❗ Current head 37b872e differs from pull request most recent head e7d4b32. Consider uploading reports for the commit e7d4b32 to get more accurate results

@@            Coverage Diff             @@
##             main    #1455      +/-   ##
==========================================
- Coverage   57.81%   57.79%   -0.02%     
==========================================
  Files          62       62              
  Lines        8308     8416     +108     
==========================================
+ Hits         4803     4864      +61     
- Misses       3099     3143      +44     
- Partials      406      409       +3

Impacted Files	Coverage Δ
x/wasm/client/cli/gov_tx.go	`13.67% <0.00%> (-0.75%)`	⬇️
x/wasm/types/wasmer_engine.go	`0.00% <ø> (ø)`
x/wasm/types/tx.go	`46.47% <35.29%> (-0.59%)`	⬇️
x/wasm/keeper/msg_server.go	`61.11% <63.63%> (+0.13%)`	⬆️
x/wasm/keeper/keeper.go	`86.61% <88.00%> (+0.41%)`	⬆️
x/wasm/types/codec.go	`100.00% <100.00%> (ø)`

alpe

Very nice work. 🏄
Unfortunately the task has some more complexity with duplicate code uploads and instances. This was not well documented in the issue. I have added some notes in the review.

I have some feeling that the list of codeIDs is very hard to generate and validate by voters off chain. How about replacing them with a single max codeID that is ensured to be < current lastCodeId sequence. We can then iterate through all codeIDs up to this number and build the list of unused codeIs on proposal execution. This is going to be an expensive operation on Gas but it is executed in an end-blocker.

It would be good to collect some numbers how much gas this would cost and how long this takes for some realistic data like Juno mainnet. But this can be a seconds task

alpe · 2023-06-16T15:55:32Z

proto/cosmwasm/wasm/v1/tx.proto

+  string authority = 1 [ (cosmos_proto.scalar) = "cosmos.AddressString" ];
+  // CodeIDs references the WASM codes
+  repeated uint64 code_ids = 2 [


this can be a large number. The proposal submitter needs to provide this list upfront, which can be a lot of work.
It would be nice if there would be a way that this list is generated automatically on execution.

alpe · 2023-06-16T15:57:15Z

x/wasm/keeper/msg_server_integration_test.go

+			expErr:    true,
+		},
+	}
+	for name, spec := range specs {


good test cases.
We need to ensure also that no codes are pruned where an instance exists.

alpe · 2023-06-16T16:01:08Z

x/wasm/keeper/keeper.go

+		k.wasmVM.RemoveCode(info.CodeHash)
+		k.deleteCodeInfo(ctx, c)
+	}
+	return nil


you would also need to ensure that no instance exists for a codeID.

alpe · 2023-06-16T16:02:18Z

x/wasm/keeper/keeper.go

+
+		info := k.GetCodeInfo(ctx, c)
+		if info == nil {
+			return types.ErrNoSuchCodeFn(c).Wrapf("code id %d", c)


let's not abort the operation but skip the number. There is not much we can do in this case.

alpe · 2023-06-16T16:05:05Z

x/wasm/keeper/msg_server_integration_test.go

+
+			// then
+			if spec.expErr {
+				require.Error(t, err)


just add a return or the code after the else block is executed as well. The else can then be dropped and NoError is on the main path

alpe · 2023-06-16T16:08:13Z

x/wasm/keeper/keeper.go

+		if info == nil {
+			return types.ErrNoSuchCodeFn(c).Wrapf("code id %d", c)
+		}
+		k.wasmVM.RemoveCode(info.CodeHash)


ah, we are removing code by checksum and not codeID. It is very likely that multiple codeIDs for the same hash exist. We need to ensure that we only delete by a checksum when no instance for any of codeID exists.

webmaster128

🙌

webmaster128 · 2023-06-21T08:25:29Z

proto/cosmwasm/wasm/v1/tx.proto

+  string authority = 1 [ (cosmos_proto.scalar) = "cosmos.AddressString" ];
+  // LatestCodeID references the upper limit until unpinned
+  // code ids will be pruned


Could you be explicit here if there upper limit is included or not? "latest code ID" sounds lit it is included but the implementation seems to be <.

I think <= is a bit simpler to reason about.

webmaster128 · 2023-06-21T08:29:40Z

x/wasm/keeper/keeper.go

+	pinnedCodesHash := make(map[string]struct{}, 0)
+	unpinnedCodes := make([]uint64, 0)
+
+	for c := uint64(1); c < codeID; c++ {


Should we also add a minimum here? This can be optional. But then you can clean

First proposal: ID <= 1000
Second proposal 1001 <= ID <= 3000
Third proposal 3001 <= ID <= 5000
Fourth proposal 2001 <= ID <= 4000 (let's say some old codes got unpinned or migrated aways from)

Hmm, looking at the other problem regarding deletion by hash I am not sure if this suggestion makes sense. We need to look through all code IDs anyways to know which checksum is used.

🤔 we would aways iterate through all contract-infos and pined codes but when we store the max-code-id of the last run, we could abort early when reached. No need to pass it with the proposal IMHO.
This would be an optimization, let's do it in a second PR

webmaster128 · 2023-06-21T08:34:51Z

x/wasm/keeper/keeper.go

+		}
+
+		k.deleteCodeInfo(ctx, code)
+		_ = k.wasmVM.RemoveCode(info.CodeHash)


Do we really want to ignore those errors? It typically means a file could not be removed and indicates invalid state.

It is either abort the whole operation or not. IMHO there is value in removing the code-info data from the chain state even in case of an error here. There may be individual reasons on single machines, like file permissions that must not affect consensus. On the next state sync, the file would not be transferred to new nodes.
Logging the error makes sense IMHO

x/wasm/keeper/keeper.go

alpe

Thanks for the updates! There is some non optimal code when looping over the code-id range and using single get-pinned calls. It may not be very intuitive to work with iterators but they are essential for bulk operations. Please see my code example for inspiration:

// iteratePinnedCodes iterates over all pinned code ids in ascending order
func (k Keeper) iteratePinnedCodes(ctx sdk.Context, cb func(codeID uint64) bool) {
	store := prefix.NewStore(ctx.KVStore(k.storeKey), types.PinnedCodeIndexPrefix)
	iter := store.Iterator(nil, nil)
	defer iter.Close()

	l := len(types.PinnedCodeIndexPrefix)
	for ; iter.Valid(); iter.Next() {
		codeID := sdk.BigEndianToUint64(iter.Key()[l:])
		if cb(codeID) {
			return
		}
	}
}

// PruneWasmCodes iterates over all contracts and code-infos to remove unused codes that are smaller than given code id.
func (k Keeper) PruneWasmCodes(ctx sdk.Context, maxCodeID uint64) error {
	usedCodeIDs := make(map[uint64]struct{})

	// used by contracts
	k.IterateContractInfo(ctx, func(_ sdk.AccAddress, info types.ContractInfo) bool {
		if info.CodeID < maxCodeID {
			usedCodeIDs[info.CodeID] = struct{}{}
		}
		return false
	})
	// pinned to cache by gov
	k.iteratePinnedCodes(ctx, func(codeID uint64) bool {
		usedCodeIDs[codeID] = struct{}{} // may add maxCodeID but does not matter as we check later
		return codeID >= maxCodeID       // abort early
	})

	// execute prune
	done := make(map[string]struct{})
	var deleteCounter int
	k.IterateCodeInfos(ctx, func(codeID uint64, info types.CodeInfo) bool {
		if codeID >= maxCodeID {
			return true // abort early
		}
		checksum := string(info.CodeHash)
		if _, ok := usedCodeIDs[codeID]; ok {
			done[checksum] = struct{}{}
			return false
		}
		k.deleteCodeInfo(ctx, codeID) // clean up state

		if _, ok := done[checksum]; ok { // do not delete twice on disk
			return false
		}

		done[checksum] = struct{}{}
		if err := k.wasmVM.RemoveCode(info.CodeHash); err != nil {
			k.Logger(ctx).Error("failed to delete wasm file on disk", "checksum", info.CodeHash)
		} else {
			deleteCounter++
		}
		return false
	})
	k.Logger(ctx).Info("deleted unused wasm files", "total", deleteCounter)
	return nil
}

alpe · 2023-06-22T08:45:29Z

x/wasm/keeper/keeper.go

+	pinnedCodesHash := make(map[string]struct{}, 0)
+	unpinnedCodes := make([]uint64, 0)
+
+	for c := uint64(1); c < codeID; c++ {


🤔 we would aways iterate through all contract-infos and pined codes but when we store the max-code-id of the last run, we could abort early when reached. No need to pass it with the proposal IMHO.
This would be an optimization, let's do it in a second PR

alpe · 2023-06-22T08:49:49Z

x/wasm/keeper/keeper.go

+		}
+
+		k.deleteCodeInfo(ctx, code)
+		_ = k.wasmVM.RemoveCode(info.CodeHash)


It is either abort the whole operation or not. IMHO there is value in removing the code-info data from the chain state even in case of an error here. There may be individual reasons on single machines, like file permissions that must not affect consensus. On the next state sync, the file would not be transferred to new nodes.
Logging the error makes sense IMHO

alpe · 2023-06-23T08:58:44Z

x/wasm/keeper/keeper.go

+// PruneWasmCodes deletes code info for unpinned codes <= than latestCodeID
+func (k Keeper) PruneWasmCodes(ctx sdk.Context, latestCodeID uint64) error {
+	usedCodeIds := make(map[uint64]struct{}, 0)
+	usedChecksums := make(map[string]bool, 0)


Please also remove the size parameter here and the line above. We know already that it will be >0. Let's go with stdlib defaults instead

alpe · 2023-06-23T09:02:00Z

x/wasm/keeper/keeper.go

+
+	// check if instances are used only by unpinned code ids <= latestCodeID
+	k.IterateCodeInfos(ctx, func(codeID uint64, info types.CodeInfo) bool {
+		if _, ok := usedCodeIds[codeID]; !ok && codeID <= latestCodeID {


The codeID <= latestCodeID is not correct. Just remove this part that we iterate over the whole set of code infos

For example, there can be code ID 1 and 10 for the same checksum. If only code ID 10 has an instance then a prune with max code id 2 would miss the instance.

I iterate over the whole set of code infos. I need this condition to mark as "false" only the checksums with unpinned code ids <= latestCodeID. This helps because we cannot delete checksums with unpinned code ids > latestCodeID.
These will be marked as "true", so as usedChecksums.. same as if they were pinned.
This also answer to the other comment why we need map[string]bool and it is not enough to have map[string]struct{}

alpe · 2023-06-23T09:03:15Z

x/wasm/keeper/keeper.go

+	// check if instances are used only by unpinned code ids <= latestCodeID
+	k.IterateCodeInfos(ctx, func(codeID uint64, info types.CodeInfo) bool {
+		if _, ok := usedCodeIds[codeID]; !ok && codeID <= latestCodeID {
+			if _, found := usedChecksums[string(info.CodeHash)]; !found {


Interesting solution to store state with the checksum. But is it needed? If you don't set anything here, the result would be the same for if !usedChecksums[string(info.CodeHash)]. Or do I miss something?

alpe · 2023-06-23T09:08:18Z

x/wasm/keeper/keeper.go

+	// delete all unpinned code ids <= latestCodeID and all the
+	// instances used only by unpinned code ids <= latestCodeID
+	k.IterateCodeInfos(ctx, func(codeID uint64, info types.CodeInfo) bool {
+		if _, ok := usedCodeIds[codeID]; !ok && codeID <= latestCodeID {


The conditions make sense here. I would prefer some better naming for latestCodeID. Maybe maxCodeID? Latest is a moving target for me as there can be new code-infos stored while the gov proposal is in-flight

alpe · 2023-06-23T09:20:40Z

x/wasm/keeper/keeper.go

+			}
+			return false
+		}
+		usedChecksums[string(info.CodeHash)] = true


A struct{} would be a more memory efficient value in usedChecksums

pinosu · 2023-06-23T12:39:04Z

I checked logs and it fails for the issue we already raised: empty validator set.
Selected randomly generated parameters for simulated genesis: { stake_per_account: "{824677238976}", initially_bonded_validators: "0" }
So we can ignore it

alpe · 2023-06-27T11:27:51Z

x/wasm/keeper/wasmtesting/mock_engine.go

@@ -167,7 +167,7 @@ func (m *MockWasmer) Unpin(checksum wasmvm.Checksum) error {
 }

 func (m *MockWasmer) RemoveCode(checksum wasmvm.Checksum) error {
-	if m.UnpinFn == nil {
+	if m.RemoveCodeFn == nil {


alpe · 2023-06-27T11:34:31Z

x/wasm/keeper/bench_test.go

@@ -100,3 +102,26 @@ func BenchmarkCompilation(b *testing.B) {
 		})
 	}
 }
+
+// Calculate the time it takes to prune about 100k wasm codes
+func BenchmarkPruningWasmCodes(b *testing.B) {


Good start to get some numbers.
This runs with a mem db, tiny contracts and no duplicate instances. I can imagine disk IO can have some impact here but let's add another DB to compare them

pinosu · 2023-06-29T08:55:03Z

Closing because performance are very bad (See bench tests).

pinosu added 3 commits June 16, 2023 14:27

Add prune wasm codes proposal

f4b7fc0

Fix test

821f632

Fix docs

6129f8b

Add remove code method to wasmvm engine

2cbb4fb

alpe reviewed Jun 16, 2023

View reviewed changes

pinosu added 2 commits June 19, 2023 17:41

Fix comments

39c5442

Add more tests

b84c0f4

pinosu marked this pull request as ready for review June 20, 2023 14:33

webmaster128 reviewed Jun 21, 2023

View reviewed changes

x/wasm/keeper/keeper.go Show resolved Hide resolved

alpe reviewed Jun 22, 2023

View reviewed changes

pinosu added 4 commits June 22, 2023 15:25

Fix prune wasm codes method

6924b1b

Fix test

6baca9b

Fix integration test

b4dc69b

Fix bug and improve tests

7ba798e

alpe reviewed Jun 23, 2023

View reviewed changes

pinosu and others added 4 commits June 23, 2023 12:35

Rename latestCodeID and add more test cases

9ccd044

More testing

77b129e

Fix prune logic (#1460)

6e2244e

Make tests more readable

ca4d1e6

pinosu added 2 commits June 26, 2023 18:26

Add simple bench test

47b1326

Fix bench test

37b872e

alpe reviewed Jun 27, 2023

View reviewed changes

pinosu added 2 commits June 27, 2023 14:58

Improve bench tests

e7d4b32

Increase timeout

4dfc1f4

pinosu force-pushed the 999-add_prune_wasm_codes_proposal branch from 52de6a2 to 4dfc1f4 Compare June 27, 2023 14:22

pinosu closed this Jun 29, 2023

alpe mentioned this pull request Jun 30, 2023

Export empty byte code when code hash is empty #1473

Closed

pinosu mentioned this pull request Jul 3, 2023

gov: add prune wasm codes proposal #999

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prune wasm codes proposal #1455

Add prune wasm codes proposal #1455

pinosu commented Jun 16, 2023 •

edited

Loading

codecov bot commented Jun 16, 2023 •

edited

Loading

alpe left a comment

alpe Jun 16, 2023

alpe Jun 16, 2023

alpe Jun 16, 2023

alpe Jun 16, 2023

alpe Jun 16, 2023

alpe Jun 16, 2023

webmaster128 left a comment

webmaster128 Jun 21, 2023

webmaster128 Jun 21, 2023

webmaster128 Jun 21, 2023

alpe Jun 22, 2023

webmaster128 Jun 21, 2023

alpe Jun 22, 2023

alpe left a comment •

edited

Loading

alpe Jun 22, 2023

alpe Jun 22, 2023

alpe Jun 23, 2023

alpe Jun 23, 2023

pinosu Jun 23, 2023

alpe Jun 23, 2023

alpe Jun 23, 2023

alpe Jun 23, 2023

pinosu commented Jun 23, 2023

alpe Jun 27, 2023

alpe Jun 27, 2023

pinosu commented Jun 29, 2023

Add prune wasm codes proposal #1455

Add prune wasm codes proposal #1455

Conversation

pinosu commented Jun 16, 2023 • edited Loading

codecov bot commented Jun 16, 2023 • edited Loading

Codecov Report

alpe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

webmaster128 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alpe left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pinosu commented Jun 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pinosu commented Jun 29, 2023

pinosu commented Jun 16, 2023 •

edited

Loading

codecov bot commented Jun 16, 2023 •

edited

Loading

alpe left a comment •

edited

Loading