-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to avoid some of the Enoent issues #1258
Conversation
In production we see errors that file reading is attempted by the `store_metadata` function when the file is not available. It looks like there might be a concurrency issue where one process has changed the file state and possibly removed the local file, whilst another process is attempting to save changes to the asset model and therefore attempts to read metadata from the file. This change attempts to minimize the attempts to read the file from the filesystem so that we can avoid these errors.
…is unscanned In production it looks like there are concurrency issues and the VirusScannerWorker executes several times for the same file. When several processes attempt to change the asset state at the same time, it can causes other issues. Specifically any attempt to read or remove the local file after the state transition can fail.
a8f1a45
to
4705054
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both of these changes should help reduce the number of errors, but I agree that we might want to add a more robust solution later.
Has this been tested in integration?
@@ -76,7 +76,7 @@ class Asset | |||
|
|||
mount_uploader :file, AssetUploader | |||
|
|||
before_save :store_metadata, unless: :uploaded? | |||
before_save :store_metadata, unless: :uploaded?, if: -> { changes.include?(:file) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might help reduce the number of raised errors.
It wouldn't if the second process is already inside the store_metadata method
Yeah i tested a regular file upload from whitehall in integration, seems to work. |
Currently, when an asset is updated with a file which has the same name as the previous file, the metadata (specifically the MD5 hexdigest) doesn't get updated. There is a guard statement in the S3 uploader that prevents assets with and unchanged MD5 hexdigest) from uploading. This is due to a condition added in #1258 that was intended to quieten some errors we were seeing at the time. This same issue was addressed in #1263.
Asset metadata is not being updated when an asset is uploaded with a file which has the same name as the previous file. This is because of a condition added in #1258 that was intended to quieten some errors we were seeing at the time. This same issue was addressed in #1263. This is problematic as there is a guard statement in the S3 uploader that prevents assets with an unchanged MD5 hexdigest from uploading.
Asset metadata is not being updated when an asset is uploaded with a file which has the same name as the previous file. This is because of a condition added in #1258 that was intended to quieten some errors we were seeing at the time. This same issue was addressed in #1263. This is problematic as there is a guard statement in the S3 uploader that prevents assets with an unchanged MD5 hexdigest from uploading.
Asset attempts to read metadata of the file every time the asset is saved. This seems to lead in error in production as the below stack trace shows. It looks like several processes attempt to change the state of the asset at the same time, some of the processes remove the actual local file which then causes the subjequent processes to fail reading the local file.
In any case, reading the local file should be necessary only rarely - not every time when Asset state is modified. This pr attempts to minize the reads.
It also looks like in production VirusScannerWorker gets executed several times at the same time on the same asset. I don't know why this happens, but this pr attemps to avoid unnecessary state transition from the VirusScanner, if another process has already taken over the state of the asset.
IDK if this fixes these issues, but maybe it might help reduce the noise in sentry 🤷♀️
Please see logit here
Trello