Skip to content
This repository has been archived by the owner on Feb 25, 2020. It is now read-only.

Zip files from S3 #139

Open
aBuder opened this issue Oct 16, 2018 · 4 comments
Open

Zip files from S3 #139

aBuder opened this issue Oct 16, 2018 · 4 comments

Comments

@aBuder
Copy link

aBuder commented Oct 16, 2018

Hi,

is there a solution to zip files from S3 disk? Is it possible to get some pice of code?

@gdevdeiv
Copy link

Hi there! I can suggest you a starting point ;)

// Array of files to add from the S3 bucket.
$files_to_add = [
        'documents/0001.pdf',
        'documents/0002.pdf',
        'documents/0003.pdf'
];

// Create the Zipper instance.
$zipper = new \Chumper\Zipper\Zipper;

// Create (or open, please read README.md file) the documents.zip archive file.
$zip = $zipper->make('documents.zip');

// For each file in the $files_to_add array
foreach ($files_to_add as $file)
{
        // Split the string and take the filename (eg. 0001.pdf)
        $file_name = explode('/', $file);
        $file_name = $file_name[count($file_name) - 1];

        // Obtain the file contents from the Storage driver, in this case, AWS S3.
        $file_contents = Storage::disk('s3')->get($file);

        // Add them to the ZIP file.
        $zip->add($file_name, $file_contents);
}

// Good to go :)
$zip->close();

@arunbabucode
Copy link

@gdevdeiv Any idea on batch queuing this whole process? Say I have around 10k PDF (monthly statements) files to be zipped. Each document is stored on s3, when I am using a queue to download each of the files then zip it, am getting timeout due to a large number of files to process. Any idea on how to do this as a batch export?

@gdevdeiv
Copy link

Sure. PHP has a max_execution_time directive that kills any script that runs longer than the value set there (in seconds). You can find it on the php.ini config file.

To solve that problem you can either:

  • set max_execution_time that to a high number: let's say your script takes 20 minutes to complete, then you would want to set it to, at least, 1200. More info

  • make use of queues.

The first one is far easier and only requires a web server restart after it to be applied, but the downside is that apache/nginx could return a 502 Bad Gateway error while waiting for PHP to respond. You can solve that by adjusting the server configuration to wait as long as PHP does, so no timeout. Apache. Nginx. More.

The second one is preferred because queues are pieces of code that you can process independently by another PHP process. You could encapsulate your PDF generation code in a job that, for example, generates the PDF files, stores them in S3, saves the path in a database record and sends a notification (or email) to the user. This is far more complex and would require the use of queues.

@arunbabucode
Copy link

The first doesn't seem to be fit as per the current scenario since we are growing. It may temporarily fix the issue, but not a permanent fix. As I said, I am already using jobs to move the load on this entire process, but the problem is I can't figure out logic to batch download files from s3, say 500 documents in every batch, add them to a single zip file and then trigger an email once all the batches are finished and store the zip to s3 and share the link via an email.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants