Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41973: Expose new S3 option check_directory_existence_before_creation #41998

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions r/NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
-->

# arrow 17.0.0.9000
* Expose an option `check_directory_existence_before_creation` in `S3FileSystem` which defaults to `FALSE`. If it's set to false, when creating a directory the code will not check if it already exists or not. It's an optimization to try directory creation and catch the error, rather than issue two dependent I/O calls. If set to `TRUE`, when creating a directory the code will only create the directory when necessary at the cost of extra I/O calls. This can be used for key/value cloud storage which has a hard rate limit to number of object mutation operations or scenarios such as the directories already exist and you do not have creation access.

# arrow 17.0.0

Expand Down
4 changes: 2 additions & 2 deletions r/R/arrowExports.R

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 10 additions & 1 deletion r/R/filesystem.R
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,13 @@ FileSelector$create <- function(base_dir, allow_not_found = FALSE, recursive = F
#' buckets if `$CreateDir()` is called on the bucket level (default `FALSE`).
#' - `allow_bucket_deletion`: logical, if TRUE, the filesystem will delete
#' buckets if`$DeleteDir()` is called on the bucket level (default `FALSE`).
#' - `check_directory_existence_before_creation`: logical, if `FALSE`, when creating a directory the code will
#' . not check if it already exists or not. It's an optimization to try directory creation and catch the error,
#' rather than issue two dependent I/O calls.
#' if `TRUE`, when creating a directory the code will only create the directory when necessary
#' at the cost of extra I/O calls. This can be used for key/value cloud storage which has
#' a hard rate limit to number of object mutation operations or scenarios such as
#' the directories already exist and you do not have creation access (default `FALSE`).
#' - `request_timeout`: Socket read time on Windows and macOS in seconds. If
#' negative, the AWS SDK default (typically 3 seconds).
#' - `connect_timeout`: Socket connection timeout in seconds. If negative, AWS
Expand Down Expand Up @@ -411,7 +418,8 @@ S3FileSystem$create <- function(anonymous = FALSE, ...) {
invalid_args <- intersect(
c(
"access_key", "secret_key", "session_token", "role_arn", "session_name",
"external_id", "load_frequency", "allow_bucket_creation", "allow_bucket_deletion"
"external_id", "load_frequency", "allow_bucket_creation", "allow_bucket_deletion",
"check_directory_existence_before_creation"
),
names(args)
)
Expand Down Expand Up @@ -459,6 +467,7 @@ default_s3_options <- list(
background_writes = TRUE,
allow_bucket_creation = FALSE,
allow_bucket_deletion = FALSE,
check_directory_existence_before_creation = FALSE,
connect_timeout = -1,
request_timeout = -1
)
Expand Down
3 changes: 3 additions & 0 deletions r/man/FileSystem.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 5 additions & 4 deletions r/src/arrowExports.cpp

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 4 additions & 1 deletion r/src/filesystem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,8 @@ std::shared_ptr<fs::S3FileSystem> fs___S3FileSystem__create(
std::string region = "", std::string endpoint_override = "", std::string scheme = "",
std::string proxy_options = "", bool background_writes = true,
bool allow_bucket_creation = false, bool allow_bucket_deletion = false,
double connect_timeout = -1, double request_timeout = -1) {
bool check_directory_existence_before_creation = false, double connect_timeout = -1,
double request_timeout = -1) {
// We need to ensure that S3 is initialized before we start messing with the
// options
StopIfNotOk(fs::EnsureS3Initialized());
Expand Down Expand Up @@ -330,6 +331,8 @@ std::shared_ptr<fs::S3FileSystem> fs___S3FileSystem__create(

s3_opts.allow_bucket_creation = allow_bucket_creation;
s3_opts.allow_bucket_deletion = allow_bucket_deletion;
s3_opts.check_directory_existence_before_creation =
check_directory_existence_before_creation;

s3_opts.request_timeout = request_timeout;
s3_opts.connect_timeout = connect_timeout;
Expand Down
6 changes: 4 additions & 2 deletions r/tests/testthat/test-s3-minio.R
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,17 @@ fs <- S3FileSystem$create(
scheme = "http",
endpoint_override = paste0("localhost:", minio_port),
allow_bucket_creation = TRUE,
allow_bucket_deletion = TRUE
allow_bucket_deletion = TRUE,
check_directory_existence_before_creation = TRUE,
)
limited_fs <- S3FileSystem$create(
access_key = minio_key,
secret_key = minio_secret,
scheme = "http",
endpoint_override = paste0("localhost:", minio_port),
allow_bucket_creation = FALSE,
allow_bucket_deletion = FALSE
allow_bucket_deletion = FALSE,
check_directory_existence_before_creation = FALSE,
)
now <- as.character(as.numeric(Sys.time()))
fs$CreateDir(now)
Expand Down
2 changes: 1 addition & 1 deletion r/vignettes/fs.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ Also note that parameters in the URI need to be

For S3, only the following options can be included in the URI as query parameters
are `region`, `scheme`, `endpoint_override`, `access_key`, `secret_key`, `allow_bucket_creation`,
and `allow_bucket_deletion`. For GCS, the supported parameters are `scheme`, `endpoint_override`,
`allow_bucket_deletion` and `check_directory_existence_before_creation`. For GCS, the supported parameters are `scheme`, `endpoint_override`,
and `retry_limit_seconds`.

In GCS, a useful option is `retry_limit_seconds`, which sets the number of seconds
Expand Down
Loading