Skip to content

Commit

Permalink
GH-41973: Expose new S3 option check_directory_existence_before_creation
Browse files Browse the repository at this point in the history
  • Loading branch information
HaochengLIU committed Jun 6, 2024
1 parent 9ee6ea7 commit 7f3d26e
Show file tree
Hide file tree
Showing 8 changed files with 29 additions and 10 deletions.
1 change: 1 addition & 0 deletions r/NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
-->

# arrow 16.1.0.9000
* Expose an option `check_directory_existence_before_creation` in `S3FileSystem` which defaults to false. If it's set to false, when creating a directory the code will not check if it already exists or not. It's an optimization to try directory creation and catch the error, rather than issue two dependent I/O calls. If true, when creating a directory the code will only create the directory when necessary at the cost of extra I/O calls. This can be used for key/value cloud storage which has a hard rate limit to number of object mutation operations or scenerios such as the directories already exist and you do not have creation access.

# arrow 16.1.0

Expand Down
4 changes: 2 additions & 2 deletions r/R/arrowExports.R

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 10 additions & 1 deletion r/R/filesystem.R
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,13 @@ FileSelector$create <- function(base_dir, allow_not_found = FALSE, recursive = F
#' buckets if `$CreateDir()` is called on the bucket level (default `FALSE`).
#' - `allow_bucket_deletion`: logical, if TRUE, the filesystem will delete
#' buckets if`$DeleteDir()` is called on the bucket level (default `FALSE`).
#' - `check_directory_existence_before_creation`: logical, if FALSE, when creating a directory the code will
#' . not check if it already exists or not. It's an optimization to try directory creation and catch the error,
#' rather than issue two dependent I/O calls.
#' if TRUE, when creating a directory the code will only create the directory when necessary
#' at the cost of extra I/O calls. This can be used for key/value cloud storage which has
#' a hard rate limit to number of object mutation operations or scenerios such as
#' the directories already exist and you do not have creation access (default `FALSE`).
#' - `request_timeout`: Socket read time on Windows and macOS in seconds. If
#' negative, the AWS SDK default (typically 3 seconds).
#' - `connect_timeout`: Socket connection timeout in seconds. If negative, AWS
Expand Down Expand Up @@ -411,7 +418,8 @@ S3FileSystem$create <- function(anonymous = FALSE, ...) {
invalid_args <- intersect(
c(
"access_key", "secret_key", "session_token", "role_arn", "session_name",
"external_id", "load_frequency", "allow_bucket_creation", "allow_bucket_deletion"
"external_id", "load_frequency", "allow_bucket_creation", "allow_bucket_deletion",
"check_directory_existence_before_creation"
),
names(args)
)
Expand Down Expand Up @@ -459,6 +467,7 @@ default_s3_options <- list(
background_writes = TRUE,
allow_bucket_creation = FALSE,
allow_bucket_deletion = FALSE,
check_directory_existence_before_creation = FALSE,
connect_timeout = -1,
request_timeout = -1
)
Expand Down
3 changes: 3 additions & 0 deletions r/man/FileSystem.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 5 additions & 4 deletions r/src/arrowExports.cpp

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 4 additions & 1 deletion r/src/filesystem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,8 @@ std::shared_ptr<fs::S3FileSystem> fs___S3FileSystem__create(
std::string region = "", std::string endpoint_override = "", std::string scheme = "",
std::string proxy_options = "", bool background_writes = true,
bool allow_bucket_creation = false, bool allow_bucket_deletion = false,
double connect_timeout = -1, double request_timeout = -1) {
bool check_directory_existence_before_creation = false, double connect_timeout = -1,
double request_timeout = -1) {
// We need to ensure that S3 is initialized before we start messing with the
// options
StopIfNotOk(fs::EnsureS3Initialized());
Expand Down Expand Up @@ -330,6 +331,8 @@ std::shared_ptr<fs::S3FileSystem> fs___S3FileSystem__create(

s3_opts.allow_bucket_creation = allow_bucket_creation;
s3_opts.allow_bucket_deletion = allow_bucket_deletion;
s3_opts.check_directory_existence_before_creation =
check_directory_existence_before_creation;

s3_opts.request_timeout = request_timeout;
s3_opts.connect_timeout = connect_timeout;
Expand Down
4 changes: 3 additions & 1 deletion r/tests/testthat/test-s3-minio.R
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,8 @@ fs <- S3FileSystem$create(
scheme = "http",
endpoint_override = paste0("localhost:", minio_port),
allow_bucket_creation = TRUE,
allow_bucket_deletion = TRUE
allow_bucket_deletion = TRUE,
check_directory_existence_before_creation = TRUE,
)
limited_fs <- S3FileSystem$create(
access_key = minio_key,
Expand All @@ -55,6 +56,7 @@ limited_fs <- S3FileSystem$create(
endpoint_override = paste0("localhost:", minio_port),
allow_bucket_creation = FALSE,
allow_bucket_deletion = FALSE
check_directory_existence_before_creation = false,
)
now <- as.character(as.numeric(Sys.time()))
fs$CreateDir(now)
Expand Down
2 changes: 1 addition & 1 deletion r/vignettes/fs.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ Also note that parameters in the URI need to be

For S3, only the following options can be included in the URI as query parameters
are `region`, `scheme`, `endpoint_override`, `access_key`, `secret_key`, `allow_bucket_creation`,
and `allow_bucket_deletion`. For GCS, the supported parameters are `scheme`, `endpoint_override`,
`allow_bucket_deletion` and `check_directory_existence_before_creation`. For GCS, the supported parameters are `scheme`, `endpoint_override`,
and `retry_limit_seconds`.

In GCS, a useful option is `retry_limit_seconds`, which sets the number of seconds
Expand Down

0 comments on commit 7f3d26e

Please sign in to comment.