Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve mapped and head modes. #21

Merged
merged 33 commits into from
Oct 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
e5cf68b
Incremental read/write for `heap` mode to reduce memory contention
greg7mdp Sep 18, 2023
8213c33
Finish implementing the `readonly` mapped mode.
greg7mdp Sep 25, 2023
c6735eb
In `mapped` mode, save only modified pages at exit.
greg7mdp Sep 25, 2023
93cd1d3
Update cicd to use `debian:bullseye` instead of `debian:buster`.
greg7mdp Sep 25, 2023
19ee2e0
Avoid multiple calls to `msync`
greg7mdp Sep 25, 2023
235d956
Use boost interprocess mmap APIs
greg7mdp Sep 26, 2023
66d3326
Reuse `_file_mapping` instead of creating a new `bip::file_mapping`
greg7mdp Sep 26, 2023
832805a
Fix my previous change for flushing the region to disk.
greg7mdp Sep 26, 2023
4cde714
Add `instance tracker` so that we can flush all dbs to disk before cl…
greg7mdp Sep 27, 2023
4bc07e4
Cleanup error cases.
greg7mdp Sep 27, 2023
619ba1f
code cleanup and renaming some members.
greg7mdp Sep 27, 2023
4dcbb00
Add missing Boost random dependency (needed in Leap).
greg7mdp Sep 28, 2023
219e89b
Update boost version
greg7mdp Sep 28, 2023
6b4eda4
Remove `benchmark` from default build.
greg7mdp Sep 28, 2023
6e1aa5a
Reduce overlap of memory mappings existence.
greg7mdp Sep 28, 2023
d6c1dcc
Add description for `clear_refs_failed` error
greg7mdp Sep 28, 2023
4a8070e
Merge branch 'main' of github.com:AntelopeIO/chainbase into mapped_an…
greg7mdp Sep 29, 2023
c3352cc
Remove unused code.
greg7mdp Sep 29, 2023
d275422
Add extra test mode `mapped_shared`.
greg7mdp Sep 30, 2023
65eefd4
Make sure we don't try to use the `pagemap` feature on platforms wher…
greg7mdp Sep 30, 2023
abc648c
Remove leftover comment not necessary anymore.
greg7mdp Oct 2, 2023
da2910c
Address PR comments.
greg7mdp Oct 2, 2023
7ae2b7c
Add another commment.
greg7mdp Oct 2, 2023
e7a9b5a
Check for db file on tempfs and refuse to start unless in `mapped_sha…
greg7mdp Oct 2, 2023
6cce710
Add API to flush RW db and convert to RO mapping after snapshot.
greg7mdp Oct 2, 2023
4ced7af
Fix `divide by zero` in `heap` mode.
greg7mdp Oct 3, 2023
44c9a20
Remove some unneeded includes.
greg7mdp Oct 3, 2023
4b7cf64
`mapped` mode: add code to write some pages to disk when available RA…
greg7mdp Oct 3, 2023
4ab8944
Address PR comments.
greg7mdp Oct 3, 2023
173287c
Make new node the non-default one (`mapped_private`)
greg7mdp Oct 3, 2023
7ff3038
Disable `check_memory_and_flush_if_needed()` which was not working co…
greg7mdp Oct 3, 2023
d928ec5
Address PR comment
greg7mdp Oct 4, 2023
7817736
Remove unneeded `std::cerr` message as per PR comment.
greg7mdp Oct 4, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
runs-on: ubuntu-22.04
strategy:
matrix:
os: ["debian:buster", "ubuntu:jammy"]
os: ["debian:bullseye", "ubuntu:jammy"]
container: ${{matrix.os}}
steps:
- name: Install deps
Expand Down
4 changes: 2 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ endif()
SET(PLATFORM_LIBRARIES)

if(CMAKE_CXX_STANDARD EQUAL 98)
message(FATAL_ERROR "chainbase requires c++17 or newer")
message(FATAL_ERROR "chainbase requires c++20 or newer")
elseif(NOT CMAKE_CXX_STANDARD)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD 20)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not critical for this PR but something that can be done in the future,

This,

cmake_minimum_required( VERSION 3.5 )

should be bumped to 3.12 as that's the first version that knows c++20.

Also this entire if/elseif/endif block is logically nonsensical. My guess was it originally required c++11, and it would make sense in that case. I might suggest changing the way this is done to how the bls lib does it.

Copy link
Contributor Author

@greg7mdp greg7mdp Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in the next PR!

set(CMAKE_CXX_STANDARD_REQUIRED ON)
endif()

Expand Down
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

## Dependencies

- C++17
- C++20
- [Boost](http://www.boost.org/)
- CMake Build Process
- Supports Linux, Mac OS X (no Windows Support)
Expand Down Expand Up @@ -118,7 +118,10 @@ boost::multi_index_container. This means that two or more threads may read the
same time, but all writes must be protected by a mutex.

Multiple processes may open the same database if care is taken to use interprocess locking on the
database.
database.

When using the `map_mode = mapped_private`, it is not thread-safe to construct a new chainbase instance
in one thread while other threads are writing to other chainbase databases.

## Persistence

Expand Down
8 changes: 8 additions & 0 deletions include/chainbase/chainbase.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -520,6 +520,14 @@ namespace chainbase {
_read_only_mode = false;
}

void revert_to_private_mode() {
_db_file.revert_to_private_mode();
}

size_t check_memory_and_flush_if_needed() {
return _db_file.check_memory_and_flush_if_needed();
}

private:
pinnable_mapped_file _db_file;
bool _read_only = false;
Expand Down
168 changes: 168 additions & 0 deletions include/chainbase/pagemap_accessor.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
#pragma once

#include <fcntl.h> // open
#include <unistd.h> // pread, sysconf
#include <cstdlib>
#include <cassert>
#include <iostream>
#include <fstream>
#include <filesystem>
#include <vector>
#include <span>
#include <boost/interprocess/managed_mapped_file.hpp>

namespace chainbase {

namespace bip = boost::interprocess;

class pagemap_accessor {
public:
~pagemap_accessor() {
_close();
}

bool clear_refs() const {
if constexpr (!_pagemap_supported)
return false;

int fd = ::open("/proc/self/clear_refs", O_WRONLY);
if (fd < 0)
return false;

// Clear soft-dirty bits from the task's PTEs.
// This is done by writing "4" into the /proc/PID/clear_refs file of the task in question.
//
// After this, when the task tries to modify a page at some virtual address, the #PF occurs
// and the kernel sets the soft-dirty bit on the respective PTE.
// ----------------------------------------------------------------------------------------
const char *v = "4";
heifner marked this conversation as resolved.
Show resolved Hide resolved
bool res = write(fd, v, 1) == 1;
::close(fd);
return res;
}

static constexpr bool pagemap_supported() {
return _pagemap_supported;
}

static bool is_marked_dirty(uint64_t entry) {
return !!(entry & (1Ull << 55));
}

static size_t page_size() {
return pagesz;
}

bool page_dirty(uintptr_t vaddr) const {
uint64_t data;
if (!read(vaddr, { &data, 1 }))
return true;
return this->is_marked_dirty(data);
}

// /proc/pid/pagemap. This file lets a userspace process find out which physical frame each virtual page
// is mapped to. It contains one 64-bit value for each virtual page, containing the following data
// (from fs/proc/task_mmu.c, above pagemap_read):
//
// Bits 0-54 page frame number (PFN) if present (note: field is zeroed for non-privileged users)
// Bits 0-4 swap type if swapped
// Bits 5-54 swap offset if swapped
// Bit 55 pte is soft-dirty (see Documentation/admin-guide/mm/soft-dirty.rst)
// Bit 56 page exclusively mapped (since 4.2)
// Bit 57 pte is uffd-wp write-protected (since 5.13) (see Documentation/admin-guide/mm/userfaultfd.rst)
// Bits 58-60 zero
// Bit 61 page is file-page or shared-anon (since 3.5)
// Bit 62 page swapped
// Bit 63 page present
//
// Here we are just checking bit #55 (the soft-dirty bit).
// ----------------------------------------------------------------------------------------------------
bool read(uintptr_t vaddr, std::span<uint64_t> dest_uint64) const {
if constexpr (!_pagemap_supported)
return false;

if (!_open()) // make sure file is open
return false;
assert(_pagemap_fd >= 0);
auto dest = std::as_writable_bytes(dest_uint64);
std::byte* cur = dest.data();
size_t bytes_remaining = dest.size();
uintptr_t offset = (vaddr / pagesz) * sizeof(uint64_t);
while (bytes_remaining != 0) {
ssize_t ret = pread(_pagemap_fd, cur, bytes_remaining, offset + (cur - dest.data()));
if (ret < 0)
return false;
bytes_remaining -= (size_t)ret;
cur += ret;
}
return true;
}

// copies the modified pages with the virtual address space specified by `rgn` to an
// equivalent region starting at `offest` within the (open) file pointed by `fd`.
// The specified region *must* be a multiple of the system's page size, and the specified
// region should exist in the disk file.
// --------------------------------------------------------------------------------------
bool update_file_from_region(std::span<std::byte> rgn, bip::file_mapping& mapping, size_t offset, bool flush, size_t& written_pages) const {
if constexpr (!_pagemap_supported)
return false;

assert(rgn.size() % pagesz == 0);
size_t num_pages = rgn.size() / pagesz;
std::vector<uint64_t> pm(num_pages);

// get modified pages
if (!read((uintptr_t)rgn.data(), pm))
return false;
bip::mapped_region map_rgn(mapping, bip::read_write, offset, rgn.size());
std::byte* dest = (std::byte*)map_rgn.get_address();
if (dest) {
for (size_t i=0; i<num_pages; ++i) {
if (is_marked_dirty(pm[i])) {
size_t j = i + 1;
while (j<num_pages && is_marked_dirty(pm[j]))
++j;
memcpy(dest + (i * pagesz), rgn.data() + (i * pagesz), pagesz * (j - i));
written_pages += (j - i);
i += j - i - 1;
}
}
if (flush && !map_rgn.flush(0, rgn.size(), /* async = */ false))
std::cerr << "CHAINBASE: ERROR: flushing buffers failed" << '\n';
return true;
}
return false;
}

private:
bool _open() const {
assert(_pagemap_supported);
if (_pagemap_fd < 0) {
_pagemap_fd = ::open("/proc/self/pagemap", O_RDONLY);
if (_pagemap_fd < 0)
return false;
}
return true;
}

bool _close() const {
if (_pagemap_fd >= 0) {
assert(_pagemap_supported);
::close(_pagemap_fd);
_pagemap_fd = -1;
}
return true;
}

static inline size_t pagesz = sysconf(_SC_PAGE_SIZE);

#if defined(__linux__) && defined(__x86_64__)
static constexpr bool _pagemap_supported = true;
#else
static constexpr bool _pagemap_supported = false;
#endif

mutable int _pagemap_fd = -1;
};

} // namespace chainbase
31 changes: 23 additions & 8 deletions include/chainbase/pinnable_mapped_file.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
#include <boost/interprocess/sync/file_lock.hpp>
#include <boost/asio/io_service.hpp>
#include <filesystem>
#include <vector>

namespace chainbase {

namespace bip = boost::interprocess;
Expand All @@ -20,7 +22,9 @@ enum db_error_code {
bad_header,
no_access,
aborted,
no_mlock
no_mlock,
clear_refs_failed,
tempfs_incompatible_mode
};

const std::error_category& chainbase_error_category();
Expand All @@ -40,31 +44,39 @@ class pinnable_mapped_file {
typedef typename bip::managed_mapped_file::segment_manager segment_manager;

enum map_mode {
mapped,
heap,
locked
mapped, // file is mmaped in MAP_SHARED mode. Only mode where changes can be seen by another chainbase instance
mapped_private,// file is mmaped in MAP_PRIVATE mode, and only updated at exit
heap, // file is copied at startup to an anonymous mapping using huge pages (if available)
locked // file is copied at startup to an anonymous mapping using huge pages (if available) and locked in memory
};

pinnable_mapped_file(const std::filesystem::path& dir, bool writable, uint64_t shared_file_size, bool allow_dirty, map_mode mode);
pinnable_mapped_file(pinnable_mapped_file&& o);
pinnable_mapped_file& operator=(pinnable_mapped_file&&);
pinnable_mapped_file(pinnable_mapped_file&& o) noexcept ;
pinnable_mapped_file& operator=(pinnable_mapped_file&&) noexcept ;
pinnable_mapped_file(const pinnable_mapped_file&) = delete;
pinnable_mapped_file& operator=(const pinnable_mapped_file&) = delete;
~pinnable_mapped_file();

segment_manager* get_segment_manager() const { return _segment_manager;}
void revert_to_private_mode();
size_t check_memory_and_flush_if_needed();


private:
void set_mapped_file_db_dirty(bool);
void load_database_file(boost::asio::io_service& sig_ios);
void save_database_file();
bool all_zeros(char* data, size_t sz);
void save_database_file(bool flush = true);
static bool all_zeros(const std::byte* data, size_t sz);
void setup_non_file_mapping();
void setup_copy_on_write_mapping();
std::pair<std::byte*, size_t> get_region_to_save() const;
heifner marked this conversation as resolved.
Show resolved Hide resolved

bip::file_lock _mapped_file_lock;
std::filesystem::path _data_file_path;
std::string _database_name;
size_t _database_size;
bool _writable;
bool _sharable;

bip::file_mapping _file_mapping;
bip::mapped_region _file_mapped_region;
Expand All @@ -79,7 +91,10 @@ class pinnable_mapped_file {

segment_manager* _segment_manager = nullptr;

static std::vector<pinnable_mapped_file*> _instance_tracker;

constexpr static unsigned _db_size_multiple_requirement = 1024*1024; //1MB
constexpr static size_t _db_size_copy_increment = 1024*1024*1024; //1GB
};

std::istream& operator>>(std::istream& in, pinnable_mapped_file::map_mode& runtime);
Expand Down
Loading