Skip to content

Commit

Permalink
Implemented AWS S3 storage backend (#44)
Browse files Browse the repository at this point in the history
Current implementation accepts storage URIs in the following form:
's3://[<access_key_id>:<secret_access_key>@]<bucket_name>[.<region>]/<path>'.
In case '<region>' is not specified explicitly, it will be auto-detected from
the location of the '<bucket_name>' bucket.

As AWS S3 does not have a direct way to append data to an existin object, we
use the following strategy for resuming stream operations.
* Upon opening a stream (via 'do_open_stream()' method) for an object that
  already exists on S3 (in case we run the utility on a storage that has already
  been initialized), we download the content of this object into a temporary
  file.
* All writes (data appends) requested by the 'do_write_data_to_stream()' method
  will be performed to this temporary file.
* When 'do_close_stream() method is called, we upload the content of this
  temporary file back to S3 (overwriting the content of the existing object).

Re-designed 'binsrv::s3_storage_backend' class. Introduced internal
'aws_context' class that follows the 'pimpl' idiom, so that the main class
include file 'binsrv/s3_storage_backend.hpp' does not depemnd on any '<aws/*>'
headers.

'binsrv::filesystem_backend_storage' class now explicitly specifies required
combinations of the 'std::ios_base::[in | out | binary | trunc | app]' flags
in full for all internal 'std::ifstream' / 'std::ofstream' / 'std::fstream'
objects.

'binsrv::storage' class destructor now tries to call 'backend_->close_stream();'
in order to flush stream state to the storage backend in case of normal /
exceptional shutdown.

'binsrv::basic_storage_backend' class extended with additional method
'is_stream_open()' indication whether the object is in a state between
'open_stream()' and 'close_stream()' calls.

'open_stream()' method in the 'binsrv::basic_storage_backend' class now accepts
one additional parameter indicating an intent to either create or append a
storage stream explicitly.

Similarly to 'easymysql::core_error' added 'binsrv::s3_error' exception class
with its own 's3_category()' error category ('std::error_category').

Similarly to 'easymysql::raise_core_error_from_connection()' added
'raise_s3_error_from_outcome()' helper function which throws an exception with
error info extracted from 'Aws::S3Crt::S3CrtError' (an error-path alternative
of almost any 'S3Crt' call outcome).

'easymysql::raise_core_error_from_connection()' helper function extended with
additional 'user_message' parameter.

Main application now prints 'successfully shut down' to the log at the end of
execution.

'binsrv' MTR test case extended with additional logic that allows to use both
'file://' and 's3://' as backend storage providers (the choice depends on
whether 'MTR_BINSRV_AWS_ACCESS_KEY_ID' / 'MTR_BINSRV_AWS_SECRET_ACCESS_KEY' /
'MTR_BINSRV_AWS_S3_BUCKET' environment variables are set or not).

Added extra precautions for accidentally leaking AWS credentials - we now
temporarily disable MySQL general query log to make sure that
'AWS_ACCESS_KEY_ID' / 'MTR_BINSRV_AWS_SECRET_ACCESS_KEY' will not appear in the
recorded SQL queries.

Added 'diff_with_storage_object.inc' MTR include file that can compare a local
file with an object from backend storage (either 'file' or 's3').

Added more instructions on how to make 'binsrv' MTR test case use AWS S3 as a
storage backend in 'mtr/README'.

Added '.clang-format' file to 'mtr' directory to exclude MTR test cases and
include files from being processed by 'clang-format'.
  • Loading branch information
percona-ysorokin authored May 9, 2024
1 parent 1d19abd commit 07b8637
Show file tree
Hide file tree
Showing 26 changed files with 944 additions and 134 deletions.
5 changes: 5 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,11 @@ set(source_files
src/binsrv/main_config.hpp
src/binsrv/main_config.cpp

src/binsrv/s3_error_helpers_private.hpp
src/binsrv/s3_error_helpers_private.cpp
src/binsrv/s3_error.hpp
src/binsrv/s3_error.cpp

src/binsrv/s3_storage_backend.hpp
src/binsrv/s3_storage_backend.cpp

Expand Down
2 changes: 2 additions & 0 deletions mtr/.clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
DisableFormat: true
SortIncludes: Never
12 changes: 12 additions & 0 deletions mtr/README
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,15 @@ following:
2. Set 'BINSRV' enviroment variable pointing to the 'binlog_server' binary
and run MTR.
BINSRV=<build_directory_path>/binlog_server ./mysql-test/mtr --suite=binlog_streaming
3. In order to run the tests using AWS S3 as a storage backend also define the
following environemnt variables
* MTR_BINSRV_AWS_ACCESS_KEY_ID - AWS access key ID
* MTR_BINSRV_AWS_SECRET_ACCESS_KEY - AWS secret access key
* MTR_BINSRV_AWS_S3_BUCKET - AWS S3 bucket name
* MTR_BINSRV_AWS_S3_REGION - AWS S3 region (optional)
BINSRV=<build_directory_path>/binlog_server \
MTR_BINSRV_AWS_ACCESS_KEY_ID=... \
MTR_BINSRV_AWS_SECRET_ACCESS_KEY=... \
MTR_BINSRV_AWS_S3_BUCKET=my-bucket \
MTR_BINSRV_AWS_S3_REGION=eu-central-1 \
./mysql-test/mtr --suite=binlog_streaming
39 changes: 39 additions & 0 deletions mtr/binlog_streaming/include/diff_with_storage_object.inc
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#
# Compares a file on a local filesystem with an object from the backend storage.
#
# Usage:
# --let $storage_backend = file
# --let $local_file = $MYSQL_TMP_DIR/first
# --let $storage_object = $MYSQL_TMP_DIR/second
# --source diff_with_storage_object.inc
#
# or
# --let $aws_cli = AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... aws
# --let $storage_backend = s3
# --let $local_file = $MYSQL_TMP_DIR/first
# --let $aws_s3_bucket = my-bucket
# --let $storage_object = /vault/second
# --source diff_with_storage_object.inc
#
# $storage_backend - determines stortage backend type (either 'file' or 'fs')
# $aws_cli - path to AWS command line interface (cli) tools with AWS_ACCESS_KEY_ID /
# AWS_SECRET_ACCESS_KEY environment variables set appropriately
# (needed only if $storage_backend is 's3')
# $local_file - a path to the first file on a local filesystem
# $aws_s3_bucket - AWS S3 bucket name (needed only if $storage_backend is 's3')
# $storage_object - if $storage_backend is 'file', a path to the second file on
# a local filesystem; if $storage_backend is 's3', a path to
# the second object on AWS S3
#

if ($storage_backend == file)
{
--diff_files $local_file $storage_object
}
if ($storage_backend == s3)
{
--let $downloaded_file_path = $MYSQL_TMP_DIR/diff_with_storage_object.downloaded
--exec $aws_cli s3 cp s3://$aws_s3_bucket$storage_object $downloaded_file_path > /dev/null
--diff_files $local_file $downloaded_file_path
--remove_file $downloaded_file_path
}
29 changes: 7 additions & 22 deletions mtr/binlog_streaming/r/binsrv.result
Original file line number Diff line number Diff line change
Expand Up @@ -20,28 +20,6 @@ INSERT INTO t1 VALUES(DEFAULT);

*** Generating a configuration file in JSON format for the Binlog
*** Server utility.
SET @storage_path = '<BINSRV_STORAGE_PATH>';
SET @storage_uri = CONCAT('file://', @storage_path);
SET @log_path = '<BINSRV_LOG_PATH>';
SET @delimiter_pos = INSTR(USER(), '@');
SET @connection_user = SUBSTRING(USER(), 1, @delimiter_pos - 1);
SET @connection_host = SUBSTRING(USER(), @delimiter_pos + 1);
SET @connection_host = IF(@connection_host = 'localhost', '127.0.0.1', @connection_host);
SET @binsrv_config_json = JSON_OBJECT(
'logger', JSON_OBJECT(
'level', 'trace',
'file', @log_path
),
'connection', JSON_OBJECT(
'host', @connection_host,
'port', @@global.port,
'user', @connection_user,
'password', ''
),
'storage', JSON_OBJECT(
'uri', @storage_uri
)
);

*** Determining binlog file directory from the server.

Expand All @@ -52,6 +30,9 @@ SET @binsrv_config_json = JSON_OBJECT(
*** from the server to the <BINSRV_STORAGE_PATH> directory (second
*** binlog is still open / in use).

*** Checking that the Binlog Server utility detected an empty storage
include/assert_grep.inc [Binlog storage must be initialized on an empty directory]

*** Comparing server and downloaded versions of the first binlog file.

*** Patching the server version of the second binlog file to clear the
Expand All @@ -71,6 +52,10 @@ FLUSH BINARY LOGS;
*** binlog is no longer open / in use). Here we should also continue
*** streaming binlog events from the last saved position.

*** Checking that the Binlog Server utility detected a previously
*** initialized storage
include/assert_grep.inc [Binlog storage must be initialized on a non-empty directory]

*** Comparing server and downloaded versions of the first binlog file
*** one more time.

Expand Down
104 changes: 93 additions & 11 deletions mtr/binlog_streaming/t/binsrv.test
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
# The following environment variables must be defined to use AWS S3 as a
# storage backend:
# - $MTR_BINSRV_AWS_ACCESS_KEY_ID
# - $MTR_BINSRV_AWS_SECRET_ACCESS_KEY
# - $MTR_BINSRV_AWS_S3_BUCKET
# - $MTR_BINSRV_AWS_S3_REGION (optional)


# make sure that $BINSRV environment variable is set to the absolute path
# of the Binlog Server utility before running this test
if (!$BINSRV) {
Expand Down Expand Up @@ -36,16 +44,51 @@ FLUSH BINARY LOGS;
--echo *** Filling the table with some more data.
INSERT INTO t1 VALUES(DEFAULT);

--let $storage_backend = file
if ($MTR_BINSRV_AWS_ACCESS_KEY_ID != '')
{
if ($MTR_BINSRV_AWS_SECRET_ACCESS_KEY != '')
{
if ($MTR_BINSRV_AWS_S3_BUCKET != '')
{
--let $storage_backend = s3
--let $aws_cli = AWS_ACCESS_KEY_ID=$MTR_BINSRV_AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY=$MTR_BINSRV_AWS_SECRET_ACCESS_KEY aws
if ($MTR_BINSRV_AWS_S3_REGION != '')
{
--let $aws_cli = $aws_cli --region $MTR_BINSRV_AWS_S3_REGION
}
}
}
}

--echo
--echo *** Generating a configuration file in JSON format for the Binlog
--echo *** Server utility.
--let $binsrv_storage_path = $MYSQL_TMP_DIR/storage
--replace_result $binsrv_storage_path <BINSRV_STORAGE_PATH>
eval SET @storage_path = '$binsrv_storage_path';
SET @storage_uri = CONCAT('file://', @storage_path);

# temporarily disabling MySQL general query log so that AWS credentials
# will not appear in plain in recorded SQL queries
--disable_query_log
SET @old_sql_log_off = @@sql_log_off;
SET sql_log_off = ON;

if ($storage_backend == file)
{
--let $binsrv_storage_path = $MYSQL_TMP_DIR/storage
eval SET @storage_uri = CONCAT('file://', '$binsrv_storage_path');
}
if ($storage_backend == s3)
{
--let $qualified_bucket = $MTR_BINSRV_AWS_S3_BUCKET
if ($MTR_BINSRV_AWS_S3_REGION)
{
--let $qualified_bucket = $qualified_bucket.$MTR_BINSRV_AWS_S3_REGION
}
--let $binsrv_storage_path = `SELECT CONCAT('/mtr-', UUID())`
eval SET @storage_uri = CONCAT('s3://', '$MTR_BINSRV_AWS_ACCESS_KEY_ID', ':', '$MTR_BINSRV_AWS_SECRET_ACCESS_KEY', '@', '$qualified_bucket', '$binsrv_storage_path');
--let $aws_s3_bucket = $MTR_BINSRV_AWS_S3_BUCKET
}

--let $binsrv_log_path = $MYSQL_TMP_DIR/binsrv_utility.log
--replace_result $binsrv_log_path <BINSRV_LOG_PATH>
eval SET @log_path = '$binsrv_log_path';

SET @delimiter_pos = INSTR(USER(), '@');
Expand Down Expand Up @@ -74,6 +117,9 @@ eval SET @binsrv_config_json = JSON_OBJECT(
--let $write_to_file = $binsrv_config_file_path
--source include/write_var_to_file.inc

SET sql_log_off = @old_sql_log_off;
--enable_query_log

--echo
--echo *** Determining binlog file directory from the server.
--disable_query_log
Expand All @@ -89,21 +135,34 @@ if ($have_windows) {
--echo
--echo *** Creating a temporary directory <BINSRV_STORAGE_PATH> for storing
--echo *** binlog files downloaded via the Binlog Server utility.
--mkdir $binsrv_storage_path
if ($storage_backend == file)
{
--mkdir $binsrv_storage_path
}

--echo
--echo *** Executing the Binlog Server utility to download all binlog data
--echo *** from the server to the <BINSRV_STORAGE_PATH> directory (second
--echo *** binlog is still open / in use).
--exec $BINSRV $binsrv_config_file_path > /dev/null

--echo
--echo *** Checking that the Binlog Server utility detected an empty storage
--let $assert_text = Binlog storage must be initialized on an empty directory
--let $assert_file = $binsrv_log_path
--let $assert_count = 1
--let $assert_select = binlog storage initialized on an empty directory
--source include/assert_grep.inc

# At this point we have 2 binlog files $first_binlog (already closed/rotedted
# by the server) and $second_binlog (currently open).

# The former can be compared as is.
--echo
--echo *** Comparing server and downloaded versions of the first binlog file.
--diff_files $binlog_base_dir/$first_binlog $binsrv_storage_path/$first_binlog
--let $local_file = $binlog_base_dir/$first_binlog
--let $storage_object = $binsrv_storage_path/$first_binlog
--source ../include/diff_with_storage_object.inc

# Because the latter from the server is currently open for writing, it has one
# additional bit (LOG_EVENT_BINLOG_IN_USE_F = 0x1) set in the flags field of the
Expand Down Expand Up @@ -141,7 +200,10 @@ EOF

--echo
--echo *** Comparing server and downloaded versions of the second binlog file.
--diff_files $PATCHED_BINLOG_FILE $binsrv_storage_path/$second_binlog
--let $local_file = $PATCHED_BINLOG_FILE
--let $storage_object = $binsrv_storage_path/$second_binlog
--source ../include/diff_with_storage_object.inc

--remove_file $PATCHED_BINLOG_FILE

--echo
Expand All @@ -160,19 +222,39 @@ FLUSH BINARY LOGS;
--echo *** streaming binlog events from the last saved position.
--exec $BINSRV $binsrv_config_file_path > /dev/null

--echo
--echo *** Checking that the Binlog Server utility detected a previously
--echo *** initialized storage
--let $assert_text = Binlog storage must be initialized on a non-empty directory
--let $assert_file = $binsrv_log_path
--let $assert_count = 1
--let $assert_select = binlog storage initialized at
--source include/assert_grep.inc

--echo
--echo *** Comparing server and downloaded versions of the first binlog file
--echo *** one more time.
--diff_files $binlog_base_dir/$first_binlog $binsrv_storage_path/$first_binlog
--let $local_file = $binlog_base_dir/$first_binlog
--let $storage_object = $binsrv_storage_path/$first_binlog
--source ../include/diff_with_storage_object.inc

--echo
--echo *** Comparing server and downloaded versions of the second binlog file
--echo *** (without patching) one more time.
--diff_files $binlog_base_dir/$second_binlog $binsrv_storage_path/$second_binlog
--let $local_file = $binlog_base_dir/$second_binlog
--let $storage_object = $binsrv_storage_path/$second_binlog
--source ../include/diff_with_storage_object.inc

--echo
--echo *** Removing the Binlog Server utility storage directory.
--force-rmdir $binsrv_storage_path
if ($storage_backend == file)
{
--force-rmdir $binsrv_storage_path
}
if ($storage_backend == s3)
{
--exec $aws_cli s3 rm s3://$aws_s3_bucket$binsrv_storage_path/ --recursive > /dev/null
}

--echo
--echo *** Removing the Binlog Server utility log file.
Expand Down
1 change: 1 addition & 0 deletions src/app.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,7 @@ int main(int argc, char *argv[]) {

receive_binlog_events(*logger, binlog, storage);

logger->log(binsrv::log_severity::info, "successfully shut down");
exit_code = EXIT_SUCCESS;
} catch (...) {
handle_std_exception(logger);
Expand Down
15 changes: 8 additions & 7 deletions src/binsrv/basic_storage_backend.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,31 +39,32 @@ void basic_storage_backend::put_object(std::string_view name,
return do_put_object(name, content);
}

void basic_storage_backend::open_stream(std::string_view name) {
if (stream_opened_) {
void basic_storage_backend::open_stream(std::string_view name,
storage_backend_open_stream_mode mode) {
if (stream_open_) {
util::exception_location().raise<std::logic_error>(
"cannot open a new stream as the previous one has not been closed");
}

do_open_stream(name);
stream_opened_ = true;
do_open_stream(name, mode);
stream_open_ = true;
}

void basic_storage_backend::write_data_to_stream(util::const_byte_span data) {
if (!stream_opened_) {
if (!stream_open_) {
util::exception_location().raise<std::logic_error>(
"cannot write to the stream as it has not been opened");
}
do_write_data_to_stream(data);
}

void basic_storage_backend::close_stream() {
if (!stream_opened_) {
if (!stream_open_) {
util::exception_location().raise<std::logic_error>(
"cannot close the stream as it has not been opened");
}
do_close_stream();
stream_opened_ = false;
stream_open_ = false;
}

[[nodiscard]] std::string basic_storage_backend::get_description() const {
Expand Down
9 changes: 6 additions & 3 deletions src/binsrv/basic_storage_backend.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,21 +39,24 @@ class basic_storage_backend {
[[nodiscard]] std::string get_object(std::string_view name);
void put_object(std::string_view name, util::const_byte_span content);

void open_stream(std::string_view name);
[[nodiscard]] bool is_stream_open() const noexcept { return stream_open_; }
void open_stream(std::string_view name,
storage_backend_open_stream_mode mode);
void write_data_to_stream(util::const_byte_span data);
void close_stream();

[[nodiscard]] std::string get_description() const;

private:
bool stream_opened_{false};
bool stream_open_{false};

[[nodiscard]] virtual storage_object_name_container do_list_objects() = 0;
[[nodiscard]] virtual std::string do_get_object(std::string_view name) = 0;
virtual void do_put_object(std::string_view name,
util::const_byte_span content) = 0;

virtual void do_open_stream(std::string_view name) = 0;
virtual void do_open_stream(std::string_view name,
storage_backend_open_stream_mode mode) = 0;
virtual void do_write_data_to_stream(util::const_byte_span data) = 0;
virtual void do_close_stream() = 0;

Expand Down
2 changes: 2 additions & 0 deletions src/binsrv/basic_storage_backend_fwd.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@

namespace binsrv {

enum class storage_backend_open_stream_mode { create, append };

class basic_storage_backend;

using basic_storage_backend_ptr = std::unique_ptr<basic_storage_backend>;
Expand Down
2 changes: 1 addition & 1 deletion src/binsrv/cout_logger.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

namespace binsrv {

class [[nodiscard]] cout_logger : public basic_logger {
class [[nodiscard]] cout_logger final : public basic_logger {
public:
explicit cout_logger(log_severity min_level) : basic_logger{min_level} {}

Expand Down
2 changes: 1 addition & 1 deletion src/binsrv/file_logger.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@

namespace binsrv {

class [[nodiscard]] file_logger : public basic_logger {
class [[nodiscard]] file_logger final : public basic_logger {
public:
file_logger(log_severity min_level, std::string_view file_name);

Expand Down
Loading

0 comments on commit 07b8637

Please sign in to comment.