Improve chainbase mapped and heap behavior #1691

greg7mdp · 2023-09-28T15:20:43Z

Resolves #1650 .

Includes the unmerged (yet) chainbase PR #21

issue with heap mode: at leap startup and exit, we perform a memcpy copy between two mappings of the full database size. This causes significant pressure on the file cache when physical RAM is less than 2x the database size

=> solution: use a series of smaller mappings for the copy, reducing RAM contention.
issue with mapped mode: when leap is running, linux will grind the disk into dust with its default dirty writeback algo. it's also a perf killer.

=> solution: map the file with MAP_PRIVATE (so changes are not written back to the file), and on exit copy just the modified pages using a series of RW mappings (info on which pages are dirty is gathered using the pagemap interface. Should we still set the dirty bit in the db file at startup?

This new implementation of mapped mode, as well as the existing heap and locked modes, does not allow sharing an opened database in RW mode with other instances in RO mode (a capability not used in Leap afaik). In order to not completely remove the sharing functionality (for which a test exists in chainbase's test.cpp), a new mapped_shared mode is introduced, which is the same as the old mapped mode.

Also:

update chainbase's minimum C++ standard requirement to C++20
update chainbase's ci/cd platform from debian:buster to debian:bullseye to get c++20 support (std::span)

Performance test results (Github Actions)

Running:

./tests/PerformanceHarnessScenarioRunner.py findMax testBpOpMode --max-tps-to-test 80000 --test-iteration-min-step 10 --test-iteration-duration-sec 10 --final-iterations-duration-sec 10 --calc-chain-threads lmax overrideBasicTestConfig -v --tps-limit-per-generator 5000

	`main` branch	`gh_1650` branch
`resultAvgTps`	13,760	14,104

Performance test results (Run locally)

Running:

./tests/PerformanceHarnessScenarioRunner.py findMax testBpOpMode --max-tps-to-test 50000 --test-iteration-min-step 10 --test-iteration-duration-sec 10 --final-iterations-duration-sec 10 --calc-chain-threads lmax overrideBasicTestConfig -v --tps-limit-per-generator 5000 > /tmp/log 2>&1

	`main` branch	`gh_1650` branch
`resultAvgTps`	29,271	30,091

…d_private`

… as default.

spoonincode

I haven't done an in depth review yet, just some initial comments,

We should mix in the new mode as part of db-modes-test; edit: just realized we aren't exposing it as an option... so that might not work

New mapped mode is only available on Linux: it really shouldn't be available as an option on other platforms, and certainly shouldn't be the default on other platforms.

But wait, there's more to the above.. it seems not all CPU platforms support this soft-dirty feature on Linux. Shockingly, as far as I can tell, ARM8 does not because I do not see the CONFIG_HAVE_ARCH_SOFT_DIRTY set. ARM8 is major enough that we shouldn't break it, so if we can't find a run time way to detect a working default, maybe we need to ifdef this just to x86 for now.

This new mapped mode will largely interfere with users running state on tmpfs (from the standpoint of needlessly burning more memory). I am not sure what we should do to prevent this. Prevent starting in mapped if state is discovered on tmpfs? Silently treat mapped as mapped_shared if state is discovered on tmpfs?

spoonincode · 2023-09-30T02:47:21Z

plugins/chain_plugin/chain_plugin.cpp

+      // when loading a snapshot, all the state will be modified, so use the `shared` mode instead
+      // of `copy_on_write` to lower memory requirements
+     if (snapshot_path && chain_config->db_map_mode == pinnable_mapped_file::mapped)
+        chain_config->db_map_mode = pinnable_mapped_file::mapped_shared;


It would be nice to maintain the new mapped mode for loading snapshots because I think it would be a huge perf boost (I've seen comments from some users it takes 30 minutes to load a WAX snapshot, but it takes me less than 5 minutes in heap mode.. I'm pretty sure it's disk grinding for those users).

If we're worried about leaving 100% dirty pages after loading a snapshot, maybe one option is chainbase could expose a flush() call that (synchronously) performs the write out so that all the pages are clean. and nodeos calls that after loading the snapshot but before continuing.

It would be nice to maintain the new mapped mode for loading snapshots because I think it would be a huge perf boost (I've seen comments from some users it takes 30 minutes to load a WAX snapshot, but it takes me less than 5 minutes in heap mode.. I'm pretty sure it's disk grinding for those users).

This would be useful when you have just the right amount of RAM that can hold all the state in RAM, but not quite enough for heap mode. I'm a little bit concerned that we may get more crashes this way. Maybe I can detect the available RAM, and according to the size of the disk db configured decide if it makes sense to use the new mapped mode (for example go for it if RAM_size > 1.1 x chain-state-db-size-mb).

If we're worried about leaving 100% dirty pages after loading a snapshot, maybe one option is chainbase could expose a flush() call that (synchronously) performs the write out so that all the pages are clean. and nodeos calls that after loading the snapshot but before continuing.

Yes that's a great idea. We can still use the soft-dirty thing as is the state-db-size is still configured much greater than the actual db_size used, it will make the flush faster.

Should add an ilog or maybe even a wlog that the mode was changed from specified.

Upon reflection, I think the best compromise when loading a snapshot is:

load the snapshot in mapped_shared mode. Yes it is probably slower, but it minimizes the odds of running out of memory. With the new mapped mode, we need memory for both the currently read data from the snapshot + the full chainbase db.

when the snapshot is done loading, use a new API as suggested by Matt which flushes the still dirty pages to disk and restart with a new copy_on_write mapping.

Do you guys agree?

I just checked in the implementation of the above.

@heifner I updated the change to mapped_shared to be temporary (just while loading the snapshot), so I don't think a ilog is necessary, but I can add it if you think it might be useful still.

No, not needed if it honors the configuration after snapshot load.

greg7mdp · 2023-09-30T04:18:06Z

We should mix in the new mode as part of db-modes-test; edit: just realized we aren't exposing it as an option... so that might not work

Yes, makes sense, will do. Actually the new mapped mode is tested, but I should add mapped_shared which is the old mappped.

[update] I misunderstood. I added the mapped_shared mode in chainbase's grow_shrink.cpp test. I'll look into db_modes_test.sh.

New mapped mode is only available on Linux: it really shouldn't be available as an option on other platforms, and certainly shouldn't be the default on other platforms. ... maybe we need to ifdef this just to x86 for now.

Yes my bad I forgot that. Thanks for catching it.
I'm not sure if it would be best to use the shared mode (with disk grinding) for those platforms that don't support the pagemap/soft-dirty interface, or if we should still use the copy_on_write mode with the full save at exit (which will be slow like the heap mode of course).

This new mapped mode will largely interfere with users running state on tmpfs (from the standpoint of needlessly burning more memory). I am not sure what we should do to prevent this. Prevent starting in mapped if state is discovered on tmpfs? Silently treat mapped as mapped_shared if state is discovered on tmpfs?

We could do one of your suggestions, preferably refuse to start, and suggest to not have the state on tempfs.
I'm not sure I understand tempfs fully, but I would think having the state on tempfs is not very attractive anymore, as it would require the whole specified chain-state-db-size-mb to be in RAM+ swap, not just the modified state pages..

- test both `mapped` and `mapped_shared` modes - don't try to use the `pagemap` feature on platforms where it is not available

heifner · 2023-10-02T13:03:13Z

plugins/chain_plugin/chain_plugin.cpp

+      // when loading a snapshot, all the state will be modified, so use the `shared` mode instead
+      // of `copy_on_write` to lower memory requirements
+     if (snapshot_path && chain_config->db_map_mode == pinnable_mapped_file::mapped)
+        chain_config->db_map_mode = pinnable_mapped_file::mapped_shared;


Should add an ilog or maybe even a wlog that the mode was changed from specified.

heifner · 2023-10-02T13:07:35Z

Should we still set the dirty bit in the db file at startup?

I think so. If state is not flushed then it will not match the blocklog, fork-database, code-cache, and any future state we may have.

greg7mdp · 2023-10-02T14:12:45Z

Should we still set the dirty bit in the db file at startup?

I think so. If state is not flushed then it will not match the blocklog, fork-database, code-cache, and any future state we may have.

Yes, that's also what I thought, and what the code currently does.

greg7mdp · 2023-10-02T15:39:18Z

This new mapped mode will largely interfere with users running state on tmpfs (from the standpoint of needlessly burning more memory).

@spoonincode I think this is true as well for heap and locked mode, so I'll also prevent starting in these modes when the file is on a tmpfs file system.

spoonincode · 2023-10-03T22:51:12Z

libraries/chain/controller.cpp

   const auto guard = my->conf.state_guard_size;
   EOS_ASSERT(free >= guard, database_guard_exception, "database free: ${f}, guard size: ${g}", ("f", free)("g",guard));
+
+   // give a change to chainbase to write some pages to disk if memory becomes scarce.
+   if (auto flushed_pages = mutable_db().check_memory_and_flush_if_needed()) {


I think this call is only safe during the write window (at least, if it's actually clearing dirty bits). Are we sure validate_db_available_size() is only called during the write window? This one I'm not sure about due to ROtrx,

leap/libraries/chain/controller.cpp

Lines 2989 to 2997 in 19f78f9

transaction_trace_ptr controller::push_transaction( const transaction_metadata_ptr& trx,

fc::time_point block_deadline, fc::microseconds max_transaction_time,

uint32_t billed_cpu_time_us, bool explicit_billed_cpu_time,

int64_t subjective_cpu_bill_us ) {

validate_db_available_size();

EOS_ASSERT( get_read_mode() != db_read_mode::IRREVERSIBLE, transaction_type_exception, "push transaction not allowed in irreversible mode" );

EOS_ASSERT( trx && !trx->implicit() && !trx->scheduled(), transaction_type_exception, "Implicit/Scheduled transaction not allowed" );

return my->push_transaction(trx, block_deadline, max_transaction_time, billed_cpu_time_us, explicit_billed_cpu_time, subjective_cpu_bill_us );

}

Good catch. This is called from multiple read-only threads for read-only trx execution.

…t pointed out.

spoonincode · 2023-10-04T19:42:31Z

programs/leap-util/actions/snapshot.cpp

@@ -80,6 +80,8 @@ int snapshot_actions::run_subcommand() {
   cfg.state_size = opt->db_size * 1024 * 1024;
   cfg.state_guard_size = opt->guard_size * 1024 * 1024;
   cfg.eosvmoc_tierup = wasm_interface::vm_oc_enable::oc_none; // wasm not used, no use to fire up oc
+
+   cfg.db_map_mode = pinnable_mapped_file::map_mode::mapped;


I don't think need this line any more

right, that's the default. removed.

spoonincode · 2023-10-04T19:56:53Z

we should document this new mode in --help,

leap/plugins/chain_plugin/chain_plugin.cpp

Lines 347 to 354 in 5c9ebe7

    
                    ("database-map-mode", bpo::value<chainbase::pinnable_mapped_file::map_mode>()->default_value(chainbase::pinnable_mapped_file::map_mode::mapped), 
        
                     "Database map mode (\"mapped\", \"heap\", or \"locked\").\n" 
        
                     "In \"mapped\" mode database is memory mapped as a file.\n" 
        
           #ifndef _WIN32 
        
                     "In \"heap\" mode database is preloaded in to swappable memory and will use huge pages if available.\n" 
        
                     "In \"locked\" mode database is preloaded, locked in to memory, and will use huge pages if available.\n" 
        
           #endif 
        
                    )

greg7mdp · 2023-10-04T20:01:52Z

we should document this new mode in --help,

Sure will do after my current meeting.

greg7mdp · 2023-10-04T21:51:51Z

we should document this new mode in --help,

Sure will do after my current meeting.

done.

spoonincode · 2023-10-05T03:23:54Z

libraries/chain/controller.cpp

+   if (is_write_window()) {
+      if (auto flushed_pages = mutable_db().check_memory_and_flush_if_needed()) {
+         ilog("CHAINBASE: flushed ${p} pages to disk to decrease memory pressure", ("p", flushed_pages));
+      }


I'm not really sure why to leave this in, since it's #if 0ed on the other side. Kind of deceptive.

It will do something on the next PR, so the deception will not be long lasting.

Use Chainbase's branch with mapped mode updates.

a098946

greg7mdp marked this pull request as draft September 28, 2023 15:20

greg7mdp added 6 commits September 28, 2023 14:15

Update chainbase to branch tip.

121b0f7

Use mapped_shared mode for chainbase when loading snapshot.

2914f46

Try making the new mode not the default one (renaming new mode `mappe…

63e6bef

…d_private`

Load snapshot in mapped_shared in leap-util, and revert to new mode…

73a6d22

… as default.

Merge branch 'main' of github.com:AntelopeIO/leap into gh_1650

7da7862

Update to appbase branch tip.

8140958

greg7mdp requested review from spoonincode and heifner September 29, 2023 16:52

greg7mdp marked this pull request as ready for review September 29, 2023 16:52

spoonincode reviewed Sep 30, 2023

View reviewed changes

Update chainbase to tip:

418a570

- test both `mapped` and `mapped_shared` modes - don't try to use the `pagemap` feature on platforms where it is not available

heifner requested changes Oct 2, 2023

View reviewed changes

Update chainbase to tip.

811192f

greg7mdp added 2 commits October 2, 2023 11:48

Update appbase to branch tip.

797f045

If mapped mode was requested, revert to it after loading snapshot.

ca9b6a9

heifner approved these changes Oct 2, 2023

View reviewed changes

greg7mdp added 4 commits October 3, 2023 08:46

Update chainbase to branch tip.

5618bd1

Call chainbase API to give the opportunity to flush some dirty pages.

bb86cf0

Address PR comments (log info message in controller, naming)

fc421a9

Make the new mode non-default; rename it mapped_private.

43006e2

spoonincode reviewed Oct 3, 2023

View reviewed changes

greg7mdp added 3 commits October 3, 2023 19:45

Call check_memory_and_flush_if_needed() only in write window as Mat…

b02fa00

…t pointed out.

Add snapshot load time to info log

6f0782d

Merge branch 'main' of github.com:AntelopeIO/leap into gh_1650

958bc01

spoonincode reviewed Oct 4, 2023

View reviewed changes

Address PR comment (remove unneeded line)

5c9ebe7

Add mapped_private description to --help and docs.

eefda26

Merge branch 'main' of github.com:AntelopeIO/leap into gh_1650

e023358

spoonincode approved these changes Oct 5, 2023

View reviewed changes

greg7mdp added 2 commits October 4, 2023 23:44

Update to chainbase tip (main branch)

f427935

Merge branch 'main' of github.com:AntelopeIO/leap into gh_1650

d8fb38c

greg7mdp merged commit 2a0e9c9 into main Oct 5, 2023
21 checks passed

greg7mdp deleted the gh_1650 branch October 5, 2023 04:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve chainbase mapped and heap behavior #1691

Improve chainbase mapped and heap behavior #1691

greg7mdp commented Sep 28, 2023 •

edited

Loading

spoonincode left a comment •

edited

Loading

spoonincode Sep 30, 2023

greg7mdp Sep 30, 2023 •

edited

Loading

heifner Oct 2, 2023

greg7mdp Oct 2, 2023

greg7mdp Oct 2, 2023

greg7mdp Oct 2, 2023

heifner Oct 2, 2023

greg7mdp commented Sep 30, 2023 •

edited

Loading

heifner Oct 2, 2023

heifner commented Oct 2, 2023

greg7mdp commented Oct 2, 2023

greg7mdp commented Oct 2, 2023

spoonincode Oct 3, 2023

heifner Oct 3, 2023

greg7mdp Oct 3, 2023

spoonincode Oct 4, 2023

greg7mdp Oct 4, 2023

spoonincode commented Oct 4, 2023

greg7mdp commented Oct 4, 2023

greg7mdp commented Oct 4, 2023

spoonincode Oct 5, 2023

greg7mdp Oct 5, 2023

	transaction_trace_ptr controller::push_transaction( const transaction_metadata_ptr& trx,
	fc::time_point block_deadline, fc::microseconds max_transaction_time,
	uint32_t billed_cpu_time_us, bool explicit_billed_cpu_time,
	int64_t subjective_cpu_bill_us ) {
	validate_db_available_size();
	EOS_ASSERT( get_read_mode() != db_read_mode::IRREVERSIBLE, transaction_type_exception, "push transaction not allowed in irreversible mode" );
	EOS_ASSERT( trx && !trx->implicit() && !trx->scheduled(), transaction_type_exception, "Implicit/Scheduled transaction not allowed" );
	return my->push_transaction(trx, block_deadline, max_transaction_time, billed_cpu_time_us, explicit_billed_cpu_time, subjective_cpu_bill_us );
	}

Improve chainbase mapped and heap behavior #1691

Improve chainbase mapped and heap behavior #1691

Conversation

greg7mdp commented Sep 28, 2023 • edited Loading

Performance test results (Github Actions)

Performance test results (Run locally)

spoonincode left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greg7mdp Sep 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greg7mdp commented Sep 30, 2023 • edited Loading

Choose a reason for hiding this comment

heifner commented Oct 2, 2023

greg7mdp commented Oct 2, 2023

greg7mdp commented Oct 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spoonincode commented Oct 4, 2023

greg7mdp commented Oct 4, 2023

greg7mdp commented Oct 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greg7mdp commented Sep 28, 2023 •

edited

Loading

spoonincode left a comment •

edited

Loading

greg7mdp Sep 30, 2023 •

edited

Loading

greg7mdp commented Sep 30, 2023 •

edited

Loading