Implement support for copying directories recursively #160

carlspring · 2020-12-31T02:19:43Z

Task Description

We need to implement support for copying directories in the org.carlspring.cloud.storage.s3fs.S3FileSystemProvider class, as this currently only works for regular files and does not check if the paths are directories in order to recurse into them.

Tasks

The following tasks will need to be carried out:

Study the code of the org.carlspring.cloud.storage.s3fs.S3FileSystemProvider and propose the most efficient way to do this.
Implement the necessary changes.
Implement test cases.

Help

Our chat channel
Points of contact:

The text was updated successfully, but these errors were encountered:

edmang · 2021-02-02T19:00:18Z

The copy object seems to be limited by 5GB according to the doc: https://docs.aws.amazon.com/AmazonS3/latest/dev/CopyingObjectsExamples.html
however, it seems that there is no copy with Collection, the copy is done 1 by 1

steve-todorov · 2021-02-02T21:32:38Z

Details

You are pointing to their "old" documentation - the new one is here.

It is correct you cannot recursively copy using the S3 API, however it is possible to use batch operations for this:

To copy more than one Amazon S3 object with a single request, you can use Amazon S3 batch operations. You provide S3 Batch Operations with a list of objects to operate on. S3 Batch Operations calls the respective API to perform the specified operation. A single Batch Operations job can perform the specified operation on billions of objects containing exabytes of data.

However, I don't think this would be easy or straightforward to implement as #163 Delete objects recursively.

You cannot get the entire list of objects, because there is a limit of 1000 objects (as I mentioned in #163). It looks like the you could maybe use the ListObjectsV2PaginatorsIntegrationTest as base for paginated object listing.

Possible issues I foresee are:

In S3 you can virtually unlimited number of recursive "objects" (files or directories) in a tree-like structure:

This means we should definitely use async operations to speed up things.
We should use reactor where we can use back-pressure techniques to avoid abusing resources and allow for automatic retry on error.
Since there is a 5GB file size limit per copied object - we would need to different strategies which are triggered depending on the file size:
- SingleObjectCopyStragety - when the file is <= 5GB
- MultipartObjectCopyStrategy - when the file is >= 5GB (and uses the multipart upload API as mentioned in the first link)

The first two points are actually valid for #163 as well, but I guess we can create a follow-up after this task.

// cc @carlspring

edmang · 2021-02-15T22:07:23Z

@steve-todorov, thank you for your suggestion !
I have a little question about the use of ListObjectsV2Request in ListObjectsV2PaginatorsIntegrationTest (let forget about the async aspect for the moment :D ).
Since S3 does not have "folder" (everything is a /path/to/the/file), hence we dont need to visit all "folders" ? We only need the
Iterator on files to copy ? (In this case, the ListObjectsV2PaginatorsIntegrationTest will actually do the job)

However, when I check #163 , I do not only delete the file, but the folders too, that is why I made my own visitAllFiles method (in order to delete the leaves files firstly, and the folder after and so on...)

Do you think I should keep my own visitAllFiles or I should use the ListObjectsV2Request ? (in the latter case, maybe I should also reconsidered the #163 ?)

thanks :D

carlspring added help wanted Extra attention is needed good first issue Good for newcomers feature request labels Dec 31, 2020

carlspring changed the title ~~Implement support for copying directories~~ Implement support for copying directories recursively Jan 1, 2021

carlspring mentioned this issue Feb 2, 2021

Issue 163: delete files recursively #164

Merged

7 tasks

ennoruijters mentioned this issue Sep 29, 2021

Support S3FileSystemProvider.copy(...) for directories #384

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement support for copying directories recursively #160

Implement support for copying directories recursively #160

carlspring commented Dec 31, 2020 •

edited

Loading

edmang commented Feb 2, 2021 •

edited by steve-todorov

Loading

steve-todorov commented Feb 2, 2021

edmang commented Feb 15, 2021

Implement support for copying directories recursively #160

Implement support for copying directories recursively #160

Comments

carlspring commented Dec 31, 2020 • edited Loading

Task Description

Tasks

Help

edmang commented Feb 2, 2021 • edited by steve-todorov Loading

steve-todorov commented Feb 2, 2021

Details

Possible issues I foresee are:

edmang commented Feb 15, 2021

carlspring commented Dec 31, 2020 •

edited

Loading

edmang commented Feb 2, 2021 •

edited by steve-todorov

Loading