Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endpoint URL not utilized for external S3 resources #902

Open
kellrott opened this issue Feb 28, 2022 · 7 comments
Open

Endpoint URL not utilized for external S3 resources #902

kellrott opened this issue Feb 28, 2022 · 7 comments

Comments

@kellrott
Copy link
Contributor

Bug Report

Github issues is reserved for bug report. If you have a question, please don't use this form.
Instead, please ask your question on the Synapse Help Forum.

Operating system

Linux

Client version

Output of:

$ synapse store --parentId syn27256137 b8165ee8-a444-4e79-b5e8-162da70b1815.tar.gz

##################################################
This Synapse Project has transitioned to use storage maintained at the NCI Genomic Data Commons (GDC). GDC credentials are required for accessing files. Please contact the CCG Program Office to request GDC credentials
Uploading to endpoint: [https://gdc-jamboree-objstore.datacommons.io] bucket: [gdc-alch-jamboree]
##################################################


S3UploadFailedError: Failed to upload b8165ee8-a444-4e79-b5e8-162da70b1815.tar.gz to gdc-alch-jamboree/ab067ed0-ccc6-4361-9c2e-6544249fe1cb/b8165ee8-a444-4e79-b5e8-162da70b1815.tar.gz: An error occurred (InvalidRequest) when calling the CreateMultipartUpload operation: Invalid canned ACL

Description of the problem

  • Attempted upload to private S3 endpoint, credentials failed
  • Download from the same project/custom endpoint works, so credentials are not the issue

Expected behavior

  • ability to upload

Actual behavior

  • ACL failure

Based on reading of the code, it would appear that the issues is at
https://github.com/Sage-Bionetworks/synapsePythonClient/blob/develop/synapseclient/core/upload/upload_functions.py#L198

The call to create the upload function:

def upload_fn(credentials):
        return S3ClientWrapper.upload_file(
            bucket_name,
            None,
            remote_file_key,
            local_path,
            credentials=credentials,
            transfer_config_kwargs={'max_concurrency': syn.max_threads}
        )

Has the second argument, the endpoint_url, hard coded to None. This needs to be configured, the same way it is done at https://github.com/Sage-Bionetworks/synapsePythonClient/blob/develop/synapseclient/client.py#L1830 where the S3ClientWrapper.download_file is provided the endpoint_url from the file handle.

@thomasyu888
Copy link
Member

thomasyu888 commented Feb 28, 2022

Thanks for the report @kellrott . This is a known issue and this is the issue:

ExtraArgs={'ACL': 'bucket-owner-full-control'},
.

Basically this was added because there was an issue that the owner of the S3 bucket actually didn't have access to the S3 objects uploaded into the buckets.

That being said, this particular canned ACL is not supported on the IBM buckets. Currently unsure of the resolution other than pointing people to use older verisons of the synapseclient (2.3.1). (unfortunately...)

@JenniferShelton
Copy link

I had been using older versions of synapseclient (2.3.1) but recently an attempt to install the older version is failing. I'm not sure which dependency(ies) have a new releases that causes conda and pip to fail to install. Do you have record of the version of python, pandas, boto, etc that are compatible with 2.3.1?

@thomasyu888
Copy link
Member

thomasyu888 commented Oct 14, 2023

@JenniferShelton Apologies, I must have missed this message!

Python==3.8,3.9
pandas>=0.25.0,<1.5
boto3>=1.7.0,<2.0

We just onboarded an engineer to help with the client, I'll be sure to re-visit this issue again as we will eventually reach a point where Python 3.9 EOL. For more transparency, here is the Jira ticket internally to track this work: https://sagebionetworks.jira.com/browse/SYNPY-1198

@dheiman
Copy link

dheiman commented Dec 5, 2024

@thomasyu888 I'm still seeing this issue in 4.5.0, and the above JIRA ticket is not available to view externally.

@thomasyu888
Copy link
Member

thomasyu888 commented Dec 6, 2024

@dheiman

Thanks for reporting this - we have not made versions >2.3.1 of the client allow for downloads within this project.

I hope to look into finding a resolution for this long standing issue in 2025.

Apologies for the inconvenience

@xaviloinaz
Copy link

For the specified versions of Python/associated packages (synapseclient 2.3.1,Python==3.8,3.9, pandas>=0.25.0,<1.5, boto3>=1.7.0,<2.0), I seemed to run into the following error when trying to upload to Synapse using a manifest I created:

Starting upload...
cached_time: {'modified_time': '2024-10-01T22:21:16.000Z', 'content_md5': '26ba87912a307ec8a5b5b2ee4231bc18'}
cached_time: {'modified_time': '2024-10-01T20:32:31.000Z', 'content_md5': 'a752cdabbc916a533309ff418112f77e'}
cached_time: {'modified_time': '2024-10-01T22:20:49.000Z', 'content_md5': '46e9b576d19ad1f72dbb884eea39eb35'}
cached_time: {'modified_time': '2024-10-01T21:31:39.000Z', 'content_md5': 'e511738ed4ae16dbf1cbbb3328dd3b76'}
cached_time: {'modified_time': '2024-10-01T22:21:23.000Z', 'content_md5': '6986f185d6059db4ace0074606fa83fd'}
cached_time: {'modified_time': '2024-10-01T21:24:36.000Z', 'content_md5': '935ccb2f6ae7c9a8bef672a9f913f0b5'}
Traceback (most recent call last):
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/xloinaz/GDC_HCMI_analyses/mutational_signature_analysis/sga_venv_python_3.9/lib/python3.9/site-packages/synapseutils/sync.py", line 570, in _upload_item
    entity = self._syn.store(item.entity, used=used, executed=executed, **item.store_kwargs)
  File "/home/xloinaz/GDC_HCMI_analyses/mutational_signature_analysis/sga_venv_python_3.9/lib/python3.9/site-packages/synapseclient/client.py", line 1048, in store
    or not self.cache.contains(bundle['entity']['dataFileHandleId'], entity['path'])
  File "/home/xloinaz/GDC_HCMI_analyses/mutational_signature_analysis/sga_venv_python_3.9/lib/python3.9/site-packages/synapseclient/core/cache.py", line 146, in contains
    return compare_timestamps(_get_modified_time(path), cached_time)
  File "/home/xloinaz/GDC_HCMI_analyses/mutational_signature_analysis/sga_venv_python_3.9/lib/python3.9/site-packages/synapseclient/core/cache.py", line 59, in compare_timestamps
    if cached_time.endswith(".000Z"):
AttributeError: 'dict' object has no attribute 'endswith'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/xloinaz/GDC_HCMI_analyses/mutational_signature_analysis/sga_venv_python_3.9/lib/python3.9/site-packages/synapseutils/monitor.py", line 49, in with_retry_and_messaging
    output = func(*args, **kwargs)
  File "/home/xloinaz/GDC_HCMI_analyses/mutational_signature_analysis/sga_venv_python_3.9/lib/python3.9/site-packages/synapseutils/sync.py", line 919, in _manifest_upload
    uploader.upload(items)
  File "/home/xloinaz/GDC_HCMI_analyses/mutational_signature_analysis/sga_venv_python_3.9/lib/python3.9/site-packages/synapseutils/sync.py", line 496, in upload
    self._abort(futures)
  File "/home/xloinaz/GDC_HCMI_analyses/mutational_signature_analysis/sga_venv_python_3.9/lib/python3.9/site-packages/synapseutils/sync.py", line 471, in _abort
    raise ValueError("Sync aborted due to upload failure") from exception
ValueError: Sync aborted due to upload failure

I manually fixed this bug in the source code by instead setting "cached_time = cached_time['modified_time']", but even after I do that I run into the issues "botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the PutObject operation: Access Denied" and "The project storage usage exceeds the limit for the storage location (Project: syn26004575, Storage Location: 1, Usage: 125.61 GiB, Limit: 100 GiB)." Is there anything that can be done about these three issues? For the second issue, is this just a permissions issue I need to sort out with the AWG? For the third issue, is it no possible to store more than 100GB in a given folder with nested contents? Thanks!

@BryanFauble
Copy link
Contributor

@xaviloinaz This is unrelated to this parent issue so I have moved your question/comment over to #1154

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants