[1.0] fix table id mix up during snapshot creation #575
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#514 is defective: it operates under the premise that table ids will be restored upon snapshot loading. But that isn't true because table ids are just chainbase's primary key which, while they can be inspected, cannot be controlled by user code (it is disallowed to set it even upon creation of a chainbase row). Thus what happens is that if there are any holes in the table id values (which will occur when tables are removed), the snapshot will record table ids that will be mixed up or non-existent on loading the snapshot.
For example consider table ids of 0, 1, 2, 3, 5, 6. (table 4 has been removed at some point). The existing snapshot code will record any rows that go with table id 5 as belonging with table id 5. However when the snapshot is reloaded, table id 5 becomes table id 4, and any rows in the snapshot that were attributed to table id 5 actually get mapped to the old table id 6 (which now table id 5).
This wasn't detected earlier because it requires a table to be removed creating a hole, a snapshot created, loaded, and then some access that fails due to the mixed up table rows. For example, loading a v6 snapshot in to v1.0.0-rc1, running it for a while, creating a v7 snapshot and then reloading that v7 snapshot, all might succeed just fine if no tables were removed prior to creating the snapshot. All snapshot unit tests are trivial enough that they do not stumble on this problem; in fact, there is no need to regenerate the v7 test snapshots due to the change in this PR -- they remain byte identical because the table ids all match up due to no holes.
There is a secondary problem with the approach where table ids are not consensus but rather an implementation detail of the specific storage layout chosen by nodeos. Different nodes can have different table ids for a given table depending upon what block the node was initialized from. Thus snapshots can not be reproducible.
The change made in this PR is to instead of recording the table id each row belongs to, record the position/index of the table in the table index. This was tricky to name but is referred to as the 'flattened table id' in code. That is, if the table ids are 0, 1, 2, 4, 5, 8, 10, when writing the rows for table id 8, flattened table id 5 will be written for those rows. Because when the table index is reloaded, those tables will be assigned ids 0, 1, 2, 3, 4, 5, 6. Keep in mind it's possible for a secondary contract index to not have any rows for a given table id, so some sections of the snapshot may not have all table ids but rather just a subset. So we still need to record this flattened id and not just assume each group of rows goes with flattened id 0, 1, 2, 3, 4, 5, 6.
Resolves #568 (Alternatively we can revert #514 to resolve as well)
Leaving as draft at the moment until considering a better test to demonstrate issue is fixed.