Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mechanism to rerun lambda functions for images where thumbnail creation failed #161

Closed
mahalakshme opened this issue Sep 29, 2023 · 6 comments
Assignees

Comments

@mahalakshme
Copy link

mahalakshme commented Sep 29, 2023

Need to come up with some solution to create thumbnails for the images uploaded between Aug 21 and 29th September since for these lambda function executions has failed.

Analysis details:

  • We can reuse the existing s3BulkThumbnail creator lamda function in the repo and create another lambda function to generate thumbnails for the date mentioned above.
  • For the dates, 21/8/23 and 29/8/23 - create the thumbnail only if it doesn't exist. This can be identified using the image name, since the thumbnail image has the same name as that of the original image.

References:
https://stackoverflow.com/questions/40003827/running-s3-put-triggered-lambda-function-on-existing-s3-objects
https://stackoverflow.com/questions/51176450/run-aws-lambda-function-on-existing-s3-images

@mahalakshme mahalakshme converted this from a draft issue Sep 29, 2023
@mahalakshme mahalakshme changed the title Mechanism to generate thimbnails for the images uploaded between Aug 21 and today Mechanism to rerun lambda functions for images where thumbnail creation failed Sep 29, 2023
@mahalakshme mahalakshme moved this from In Analysis to In Analysis Review in Avni Product Sep 29, 2023
@vinayvenu
Copy link
Member

See code here - https://github.com/avniproject/avni-media/blob/main/scripts/S3BulkThumbnailCreator/index.mjs

It looks like S3 apis (python and ) provide filtering by date, could not find anything in the documentation.

@mahalakshme
Copy link
Author

I think in s3 List object we can filter by date: https://stackoverflow.com/questions/45429556/how-list-amazon-s3-bucket-contents-by-modified-date

@mahalakshme
Copy link
Author

@vinayvenu there are 1636 encounters of daily encounter types alone synced to the server in the above mentioned time range in the orgs - RWB 2023 and RWB NGOs 2023 alone. I didnt check GDGSGOM which also belongs to the same org group since looks like we need to do this card anyways.

@vinayvenu vinayvenu moved this from In Analysis Review to Analysis Complete in Avni Product Oct 3, 2023
@mahalakshme mahalakshme moved this from Analysis Complete to Ready in Avni Product Oct 3, 2023
@1t5j0y 1t5j0y self-assigned this Oct 3, 2023
@1t5j0y 1t5j0y moved this from Ready to In Progress in Avni Product Oct 3, 2023
@1t5j0y
Copy link
Contributor

1t5j0y commented Oct 3, 2023

List of affected files was fetched using:
aws s3api list-objects-v2 --bucket prod-user-media --query 'Contents[?LastModified>=2023-08-20].Key' This was run soon after fixing the lambda deployment. Running it later will return lot of files for which thumbnails have been successfully generated.
Filtered the file list to remove DB dumps (ad hoc and fast sync), mp3/mp4 files, files within thumbnails folders etc
Initial approach was decided to create copies of the images to trigger the instant lambda. However, S3 does not allow creating a copy without some change to the object/metadata/storage class. It was possible to create a copy with some changed metadata but decided to avoid any issues it could cause in the future since metadata could only be overwritten and we would lose the S3 generated metadata (content type).
Changed the approach to invoke the instant lambda from a script given the list of files for which thumbnails need to be generated. Added a delay in the script to avoid running into lambda quotas.

@1t5j0y 1t5j0y moved this from In Progress to Code Review Ready in Avni Product Oct 3, 2023
@vinayvenu vinayvenu moved this from Code Review Ready to In Code Review in Avni Product Oct 4, 2023
@vinayvenu
Copy link
Member

  1. Can you push the generated file somewhere (like https://github.com/avniproject/data-fixes) and the run log so we can look back when required to fix any issue
  2. How do we validate that this worked across all files? Sample checks? Or is there a more complete way?

@vinayvenu vinayvenu moved this from In Code Review to Code Review with Comments in Avni Product Oct 4, 2023
@1t5j0y 1t5j0y moved this from Code Review with Comments to In Progress in Avni Product Oct 4, 2023
@1t5j0y
Copy link
Contributor

1t5j0y commented Oct 4, 2023

Adding generated file as well as filtered and formatted file here itself (data-fixes looks out of place for this). Run log wasn't saved to file but it was basically the file names and response status 202.

media_created_since_aug20.zip

I validated using sample checks. Noticed a few errors in the lambda metrics and on checking cloudwatch logs, they were due to invalid source images.

@1t5j0y 1t5j0y moved this from In Progress to Code Review Ready in Avni Product Oct 4, 2023
@vinayvenu vinayvenu moved this from Code Review Ready to QA Ready in Avni Product Oct 4, 2023
@AchalaBelokar AchalaBelokar moved this from QA Ready to In QA in Avni Product Oct 4, 2023
@AchalaBelokar AchalaBelokar moved this from In QA to Done in Avni Product Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants