-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41973: Expose new S3 option check_directory_existence_before_creation #41998
base: main
Are you sure you want to change the base?
GH-41973: Expose new S3 option check_directory_existence_before_creation #41998
Conversation
|
Hi @nealrichardson , could you pls review when you have time? I have to admit I use C++/Python in my daily life but know nothing about R and just follow the existing code for allow_bucket_deletion to expose it. |
Hi @paleolimbot @thisisnic , could you review this PR when you have time? Thanks. |
Hi @HaochengLIU, I took a look through and left some suggestions. One thing I think we'll have to sort out here is the issue that's making the Check minimum supported Arrow C++ Version check fail. We'll need an ifdef guard in Edit: And can you please rebase and fix conflicts? |
7f3d26e
to
e46e0dc
Compare
Hi @amoeba , I rebased and the somehow do not see the suggestion you are referring to.. Could you point me to that? Per the install error
Where is this 00install.out log 😢 |
@HaochengLIU The CI output there can be a bit tricky to follow, but the important bit is in the next section: https://github.com/apache/arrow/actions/runs/10207786938/job/28243188693?pr=41998#step:8:49
I think what @amoeba is referring to is that fact that the arrow R package is compatible with previous versions of the Arrow C++ library, and so you're getting that error above because #41822 was only released in version 17.0.0 so you'll get errors trying to use it with previous versions, as our minimum version to support is 15.0.2. |
r/tests/testthat/test-s3-minio.R
Outdated
@@ -55,6 +56,7 @@ limited_fs <- S3FileSystem$create( | |||
endpoint_override = paste0("localhost:", minio_port), | |||
allow_bucket_creation = FALSE, | |||
allow_bucket_deletion = FALSE | |||
check_directory_existence_before_creation = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_directory_existence_before_creation = false, | |
check_directory_existence_before_creation = FALSE, |
r/NEWS.md
Outdated
@@ -18,6 +18,7 @@ | |||
--> | |||
|
|||
# arrow 16.1.0.9000 | |||
* Expose an option `check_directory_existence_before_creation` in `S3FileSystem` which defaults to false. If it's set to false, when creating a directory the code will not check if it already exists or not. It's an optimization to try directory creation and catch the error, rather than issue two dependent I/O calls. If true, when creating a directory the code will only create the directory when necessary at the cost of extra I/O calls. This can be used for key/value cloud storage which has a hard rate limit to number of object mutation operations or scenerios such as the directories already exist and you do not have creation access. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Expose an option `check_directory_existence_before_creation` in `S3FileSystem` which defaults to false. If it's set to false, when creating a directory the code will not check if it already exists or not. It's an optimization to try directory creation and catch the error, rather than issue two dependent I/O calls. If true, when creating a directory the code will only create the directory when necessary at the cost of extra I/O calls. This can be used for key/value cloud storage which has a hard rate limit to number of object mutation operations or scenerios such as the directories already exist and you do not have creation access. | |
* Expose an option `check_directory_existence_before_creation` in `S3FileSystem` which defaults to `FALSE`. If it's set to false, when creating a directory the code will not check if it already exists or not. It's an optimization to try directory creation and catch the error, rather than issue two dependent I/O calls. If set to `TRUE`, when creating a directory the code will only create the directory when necessary at the cost of extra I/O calls. This can be used for key/value cloud storage which has a hard rate limit to number of object mutation operations or scenarios such as the directories already exist and you do not have creation access. |
r/R/filesystem.R
Outdated
@@ -156,6 +156,13 @@ FileSelector$create <- function(base_dir, allow_not_found = FALSE, recursive = F | |||
#' buckets if `$CreateDir()` is called on the bucket level (default `FALSE`). | |||
#' - `allow_bucket_deletion`: logical, if TRUE, the filesystem will delete | |||
#' buckets if`$DeleteDir()` is called on the bucket level (default `FALSE`). | |||
#' - `check_directory_existence_before_creation`: logical, if FALSE, when creating a directory the code will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' - `check_directory_existence_before_creation`: logical, if FALSE, when creating a directory the code will | |
#' - `check_directory_existence_before_creation`: logical, if `FALSE`, when creating a directory the code will |
r/R/filesystem.R
Outdated
#' - `check_directory_existence_before_creation`: logical, if FALSE, when creating a directory the code will | ||
#' . not check if it already exists or not. It's an optimization to try directory creation and catch the error, | ||
#' rather than issue two dependent I/O calls. | ||
#' if TRUE, when creating a directory the code will only create the directory when necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' if TRUE, when creating a directory the code will only create the directory when necessary | |
#' if `TRUE`, when creating a directory the code will only create the directory when necessary |
r/R/filesystem.R
Outdated
#' rather than issue two dependent I/O calls. | ||
#' if TRUE, when creating a directory the code will only create the directory when necessary | ||
#' at the cost of extra I/O calls. This can be used for key/value cloud storage which has | ||
#' a hard rate limit to number of object mutation operations or scenerios such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' a hard rate limit to number of object mutation operations or scenerios such as | |
#' a hard rate limit to number of object mutation operations or scenarios such as |
r/tests/testthat/test-s3-minio.R
Outdated
@@ -55,6 +56,7 @@ limited_fs <- S3FileSystem$create( | |||
endpoint_override = paste0("localhost:", minio_port), | |||
allow_bucket_creation = FALSE, | |||
allow_bucket_deletion = FALSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allow_bucket_deletion = FALSE | |
allow_bucket_deletion = FALSE, |
Sorry @HaochengLIU, I think I forgot to save my review. You should see them now. @thisisnic is right about my note. See this example: arrow/r/src/extension-impl.cpp Line 103 in c69c1d8
I think here we want to do here is (1) conditionally define two versions of |
The tricky thing about that is the codegen script that creates Another option would be to refactor to send a list of options to |
e46e0dc
to
913400d
Compare
Hi folks, thanks for reviewing and I've addressed most of the comments. If it's confirmed that arrow/r/src/extension-impl.cpp will not work per @nealrichardson , I can try to tackle the refactor work first once I'm back from vacation |
We probably shouldn't do this here, but we should have a discussion about how much effort keeping compatibility with older versions we should push for PRs like these. I've created #43623 for that, but broadly am supportive of bumping the minimum version if we need to in order to implement this feature. |
Thanks for the catch @nealrichardson and the suggestion. I'd be supportive of approving this PR with just the "Check minimum supported Arrow C++ Version " job failing and just my existing suggestions applied. |
Rationale for this change
Expose new S3 option
check_directory_existence_before_creation
from GH-41493PR to expose in Python is also under review.
What changes are included in this PR?
Expose new S3 option
check_directory_existence_before_creation
from GH-41493Are these changes tested?
yes
Are there any user-facing changes?
Yes. R function documentation is updated.