Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented AWS S3 storage backend #44

Conversation

percona-ysorokin
Copy link
Collaborator

Current implementation accepts storage URIs in the following form: 's3://[<access_key_id>:<secret_access_key>@]<bucket_name>[.]/'. In case '' is not specified explicitly, it will be auto-detected from the location of the '<bucket_name>' bucket.

As AWS S3 does not have a direct way to append data to an existin object, we use the following strategy for resuming stream operations.

  • Upon opening a stream (via 'do_open_stream()' method) for an object that already exists on S3 (in case we run the utility on a storage that has already been initialized), we download the content of this object into a temporary file.
  • All writes (data appends) requested by the 'do_write_data_to_stream()' method will be performed to this temporary file.
  • When 'do_close_stream() method is called, we upload the content of this temporary file back to S3 (overwriting the content of the existing object).

Re-designed 'binsrv::s3_storage_backend' class. Introduced internal 'aws_context' class that follows the 'pimpl' idiom, so that the main class include file 'binsrv/s3_storage_backend.hpp' does not depemnd on any '<aws/*>' headers.

'binsrv::filesystem_backend_storage' class now explicitly specifies required combinations of the 'std::ios_base::[in | out | binary | trunc | app]' flags in full for all internal 'std::ifstream' / 'std::ofstream' / 'std::fstream' objects.

'binsrv::storage' class destructor now tries to call 'backend_->close_stream();' in order to flush stream state to the storage backend in case of normal / exceptional shutdown.

'binsrv::basic_storage_backend' class extended with additional method 'is_stream_open()' indication whether the object is in a state between 'open_stream()' and 'close_stream()' calls.

'open_stream()' method in the 'binsrv::basic_storage_backend' class now accepts one additional parameter indicating an intent to either create or append a storage stream explicitly.

Similarly to 'easymysql::core_error' added 'binsrv::s3_error' exception class with its own 's3_category()' error category ('std::error_category').

Similarly to 'easymysql::raise_core_error_from_connection()' added 'raise_s3_error_from_outcome()' helper function which throws an exception with error info extracted from 'Aws::S3Crt::S3CrtError' (an error-path alternative of almost any 'S3Crt' call outcome).

'easymysql::raise_core_error_from_connection()' helper function extended with additional 'user_message' parameter.

Main application now prints 'successfully shut down' to the log at the end of execution.

'binsrv' MTR test case extended with additional logic that allows to use both 'file://' and 's3://' as backend storage providers (the choice depends on whether 'MTR_BINSRV_AWS_ACCESS_KEY_ID' / 'MTR_BINSRV_AWS_SECRET_ACCESS_KEY' / 'MTR_BINSRV_AWS_S3_BUCKET' environment variables are set or not).

Added extra precautions for accidentally leaking AWS credentials - we now temporarily disable MySQL general query log to make sure that 'AWS_ACCESS_KEY_ID' / 'MTR_BINSRV_AWS_SECRET_ACCESS_KEY' will not appear in the recorded SQL queries.

Added 'diff_with_storage_object.inc' MTR include file that can compare a local file with an object from backend storage (either 'file' or 's3').

Added more instructions on how to make 'binsrv' MTR test case use AWS S3 as a storage backend in 'mtr/README'.

Added '.clang-format' file to 'mtr' directory to exclude MTR test cases and include files from being processed by 'clang-format'.

Current implementation accepts storage URIs in the following form:
's3://[<access_key_id>:<secret_access_key>@]<bucket_name>[.<region>]/<path>'.
In case '<region>' is not specified explicitly, it will be auto-detected from
the location of the '<bucket_name>' bucket.

As AWS S3 does not have a direct way to append data to an existin object, we
use the following strategy for resuming stream operations.
* Upon opening a stream (via 'do_open_stream()' method) for an object that
  already exists on S3 (in case we run the utility on a storage that has already
  been initialized), we download the content of this object into a temporary
  file.
* All writes (data appends) requested by the 'do_write_data_to_stream()' method
  will be performed to this temporary file.
* When 'do_close_stream() method is called, we upload the content of this
  temporary file back to S3 (overwriting the content of the existing object).

Re-designed 'binsrv::s3_storage_backend' class. Introduced internal
'aws_context' class that follows the 'pimpl' idiom, so that the main class
include file 'binsrv/s3_storage_backend.hpp' does not depemnd on any '<aws/*>'
headers.

'binsrv::filesystem_backend_storage' class now explicitly specifies required
combinations of the 'std::ios_base::[in | out | binary | trunc | app]' flags
in full for all internal 'std::ifstream' / 'std::ofstream' / 'std::fstream'
objects.

'binsrv::storage' class destructor now tries to call 'backend_->close_stream();'
in order to flush stream state to the storage backend in case of normal /
exceptional shutdown.

'binsrv::basic_storage_backend' class extended with additional method
'is_stream_open()' indication whether the object is in a state between
'open_stream()' and 'close_stream()' calls.

'open_stream()' method in the 'binsrv::basic_storage_backend' class now accepts
one additional parameter indicating an intent to either create or append a
storage stream explicitly.

Similarly to 'easymysql::core_error' added 'binsrv::s3_error' exception class
with its own 's3_category()' error category ('std::error_category').

Similarly to 'easymysql::raise_core_error_from_connection()' added
'raise_s3_error_from_outcome()' helper function which throws an exception with
error info extracted from 'Aws::S3Crt::S3CrtError' (an error-path alternative
of almost any 'S3Crt' call outcome).

'easymysql::raise_core_error_from_connection()' helper function extended with
additional 'user_message' parameter.

Main application now prints 'successfully shut down' to the log at the end of
execution.

'binsrv' MTR test case extended with additional logic that allows to use both
'file://' and 's3://' as backend storage providers (the choice depends on
whether 'MTR_BINSRV_AWS_ACCESS_KEY_ID' / 'MTR_BINSRV_AWS_SECRET_ACCESS_KEY' /
'MTR_BINSRV_AWS_S3_BUCKET' environment variables are set or not).

Added extra precautions for accidentally leaking AWS credentials - we now
temporarily disable MySQL general query log to make sure that
'AWS_ACCESS_KEY_ID' / 'MTR_BINSRV_AWS_SECRET_ACCESS_KEY' will not appear in the
recorded SQL queries.

Added 'diff_with_storage_object.inc' MTR include file that can compare a local
file with an object from backend storage (either 'file' or 's3').

Added more instructions on how to make 'binsrv' MTR test case use AWS S3 as a
storage backend in 'mtr/README'.

Added '.clang-format' file to 'mtr' directory to exclude MTR test cases and
include files from being processed by 'clang-format'.
@percona-ysorokin percona-ysorokin merged commit 07b8637 into Percona-Lab:main May 9, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant