Allow bucket to be mounted to an arbitrary logical path #2196

tedgin · 2024-05-14T19:58:41Z

I'm requesting that the iRODS S3 storage resource plugin be able to mount (attach, graft) a bucket to an arbitrary logical path in an iRODS zone.

Currently, a bucket path, e.g., /my_bucket/, is mounted at the zone logical path, e.g., /zone/. This means a data object added at /zone/home/user/object gets the name home/user/object in /my_bucket/. This is fine when a bucket is being used as a zone-wide storage resource, and the data in the bucket will primarily be accessed through iRODS. If the data will primarily be accessed outside of iRODS, but on occasion still needs to be accessed through iRODS, forcing an object in the bucket to be prefixed with something like home/user or be accessed in iRODS at the base of the zone, i.e., /zone/object is inconvenient.

Pretend that /my_bucket/ already has thousands of objects in it when an iRODS storage resource is created for it. Furthermore, there are mature workflows that add and access objects in the bucket outside of iRODS following specific naming conventions. Renaming existing objects so they don't show up directly under the zone would be difficult. This gets worse, if one of the S3 objects has the name of an existing iRODS collection or data object like home or home/tedgin/teletubbies.jpg. If the bucket path were able to be mounted to an arbitrary logical path, e.g., /zone/home/project/s3-bucket/, then the names of existing S3 objects wouldn't need to be renamed.

Having a bucket be able to be mounted to an arbitrary logical path also opens up the possibility of a user or project being able to access data from an S3 bucket that they own (and pay for) from within an iRODS zone without the S3 bucket becoming usable by everyone else in the zone. Supporting this is outside the scope of this feature request.

The text was updated successfully, but these errors were encountered:

tedgin · 2024-05-14T20:03:54Z

One way of satisfying this feature request would be to create a new archive naming policy, e.g., chroot. This naming policy would require that the logical mount path be provided as part of the S3 storage resource definition. This could be done using a context variable, e.g., ROOT_COLL. Here's an example iadmin command for creating one of these S3 resources.

iadmin mkresc \
   myBucketResc \
   s3 \
   "$(hostname)":/my_bucket/prefix/in/bucket \
   'S3_DEFAULT_HOSTNAME=s3.us-east-1.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/my_bucket.keypair;ARCHIVE_NAMING_POLICY=chroot;ROOT_COLL=/zone/home/project/s3-bucket'

Here's an implementation of this for version 4.2.11. main...tedgin:irods_resource_plugin_s3:main

korydraughn · 2024-05-14T20:38:06Z

Very interesting. We'll look into it following UGM.

alanking · 2024-05-15T12:43:26Z

For posterity...

This was discussed at length during the May 2024 S3 Working Group. Minutes have not yet been published.

trel · 2024-05-22T19:22:38Z

Now available...
https://github.com/irods-contrib/irods_working_group_s3/blob/main/20240503-minutes.md

trel · 2024-06-07T19:55:45Z

I think this is a subset of today's functionality of the S3 plugin.

This is a restriction of which logical_path(s) are allowed to be stored on this resource.

So... potentially a new context string setting...
...;LOGICAL_PATHS=/logical_path/1::/logical_path/2::/logical_path/3;...
OR
...;LOGICAL_PATH=/logical_path/1;LOGICAL_PATH=/logical_path/2;LOGICAL_PATH=/logical_path/3;...

Could be enough to let us implement the requested feature and 'pin' a resource to a certain subset of the logical namespace.

If there is existing data in a newly 'mounted' bucket, it would need to be 'scanned' or 'registered' for that data to be visible via the catalog. Could be via Lambda, could be ingest tool, etc.

trel · 2024-06-07T20:03:01Z

But that new idea wouldn't allow management/updating of the physical path in the bucket itself.

tedgin pushed a commit to tedgin/irods_resource_plugin_s3 that referenced this issue May 14, 2024

potential implementation of irods_resource_plugin_s3 issue irods#2196

fd0a855

alanking added enhancement consortium-member labels May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow bucket to be mounted to an arbitrary logical path #2196

Allow bucket to be mounted to an arbitrary logical path #2196

tedgin commented May 14, 2024

tedgin commented May 14, 2024

korydraughn commented May 14, 2024

alanking commented May 15, 2024

trel commented May 22, 2024

trel commented Jun 7, 2024

trel commented Jun 7, 2024

Allow bucket to be mounted to an arbitrary logical path #2196

Allow bucket to be mounted to an arbitrary logical path #2196

Comments

tedgin commented May 14, 2024

tedgin commented May 14, 2024

korydraughn commented May 14, 2024

alanking commented May 15, 2024

trel commented May 22, 2024

trel commented Jun 7, 2024

trel commented Jun 7, 2024