-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] AWS SDK 1.11 support in pyarrow wheels? #42154
Comments
Also cc @ihnorton @ivirshup @h-vetinari |
Conda-forge has been building against aws 1.11 for a long time already, and this also got synched back to the conda tests in arrow itself (which have bitrotted in the meantime, but there are efforts to revive them): arrow/dev/tasks/conda-recipes/.ci_support/linux_64_cuda_compiler_version11.2.yaml Lines 3 to 4 in 801de2f
In any case, we run the full test suite on the python side (not the C++ side yet, c.f. #35587), in every feedstock build, and it passes on osx. So I don't see the immediate incompatibility, which I assume is restricted to some corner cases. You should provide a stacktrace (or ideally, a reproducer) of what fails. PS. In the past there was once something that kept arrow stuck on aws 1.8 for a long time (which might help for context): aws/aws-sdk-cpp#1809 |
Looks like TileDB-Inc/TileDB-Py#1990 is relevant, but again, you should really provide an example where arrow crashes or does something wrong, not another downstream project. The fact that the import order seems to matter is already ground for suspecting that there's something else going on here. |
The specific question here is if/when will arrow wheels update to AWS SDK 1.11? The reason for the question is to understand whether the mitigation for the issue described below will be available "soon", or we need to work around it (rename symbols, further patch the AWS SDK, etc?). For more background on the issue:
Summarizing: AWS has released a mitigation for the abort, implemented here: aws/aws-sdk-cpp#2710. The mitigation is available in AWS SDK 1.11. TileDB wheels have updated to AWS SDK 1.11, but AFAICT all packages need to be updated for the mitigation to work. This issue will likely impact any other library that bundles the AWS SDK in a wheel and is loaded at the same time as pyarrow. |
PyArrow wheels don't use bundled AWS SDK for C++. It uses vcpkg's one: https://github.com/ursacomputing/crossbow/actions/runs/9544500563/job/26303310398#step:7:559
|
The VCPKG version is currently pinned at 2023.11.20: Line 92 in eec6f17
(last updated in #39622) This could certainly use another update to a more recent vcpkg state (EDIT: this is currently being done in #42171), but so that release (as @kou also showed from the logs) already included AWS SDK 1.11 (https://github.com/microsoft/vcpkg/releases/tag/2023.11.20, it updated it from 1.11.169#2 to 1.11.201) |
FWIW, this also means that the latest pyarrow wheels for 16.0.0 should actually already include AWS SDK 1.11
@ihnorton the crashes you see, is that with the latest pyarrow release from PyPI? |
Yes:
|
@jorisvandenbossche thanks for the explanation. It looks like the commit I referenced didn't actually make it in to the SDK until
Thanks for the pointer! We'll sit tight and try this again after wheels are released with that update. Much appreciated. |
That should still mean this is included in the pyarrow 16.0.0 wheels, AFAIK (because it should have used 1.11.201) |
We have tested 16.0 and 17.0-rc and we still see the issue observed in #40262 -- which appears to be waiting for user confirmation. I'll comment there to indicate we believe the referenced AWS SDK commit does not fix the issue. |
Is it possible that the multiple AWS SDK confusion would be resolved if the AWS SDKs inside the wheels were compiled with |
The answer is almost definitely yes. Building a custom
|
Turns out just updating TileDB fixes this issue, but it still will be good to update Arrow to hide the symbols from the AWS SDK, to avoid potentially clashing with another library in the future. I am not planning to do that. |
Describe the bug, including details regarding any error messages, version, and platform.
With regard to #40262, is there a plan to update
pyarrow
's AWS SDK dependency from 1.10 to 1.11? We believe fromarrow/cpp/thirdparty/versions.txt
Line 54 in fe4d04f
that
pyarrow
is currently using 1.10:It appears that a mitigation for #40262 is in AWS SDK 1.11:
aws/aws-sdk-cpp#2710
(There's significant backstory on single-cell-data/TileDB-SOMA#2692 and on TileDB-Inc/tiledbsoma-feedstock#171, if backstory is desired. A repro is here: single-cell-data/TileDB-SOMA#2692 (comment).)
cc @pitrou
Component(s)
Python
The text was updated successfully, but these errors were encountered: