-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-ingesting files seems to cause access errors #475
Comments
Is there a way to reliably re-create this issue? |
I'm not sure. But I do see the error after I visit the https://test.digital.library.jhu.edu/admin/content/media page - so maybe just visiting it causes the error to be thrown. |
After looking at this a little more, I think what is happening is related to allowing multiple ingests of the same file items, and may be related to allowing one to rename files during ingest. We allow admins to rename files in the ingest to fix filename structures that might not work for the system. This may or may not be the issue - the issue could simply be allowing re-ingest of files with the old files sticking around.... that might be more likely the issue. So if we have the following setup in the ingest:
If an admin runs that type of ingest twice, the legitimate file will have a
(the first file ingested will be disconnected from any media - so it's essentially unused.) The one with Perhaps this error is innocuous, as the file is really unused and doesn't need a thumbnail. However, what should be checked is that the proper File (filename_0.jpg) does get a drupal thumbnail. |
Providing the Islandora file derivatives function correctly, grabbing the correct file (which they seem to), this issue is probably minor in the grand scheme of things and these errors are just noise in the logs (yes, drupal fails to make its thumbnails, but that's admin facing). Providing that's true, then the only affect is that logged in admins will not see a thumbnail on the file list page ( |
Actually, I think I am wrong about the scope here after watching the cloud for a while. It appears that the wrong URL is handed to the derivatives as well (or they are somehow fetching the wrong ones). This will be an issue for re-ingest in the cloud services. :( |
High priority because it blocks our ability to re-ingest when there are errors in an ingest job. |
Possibly, Bethany thought it was a problem in S3. |
I agree this would be classified as a high priority and is likely either an S3 or a production-specific environment config setting. |
Please re-read her notes on this, she later indicates that this is a file naming issue and NOT an S3 or production specific thing. This appears to be something happening inside of drupal |
@jhujasonw I think her initial comments were on the right page. I see where she changed her thoughts on it but it appears the URL it generates for an ingest works correctly when this is the first ingest but not for the 2nd. A situation that "could" be the issue is an S3 permission configuration set to write-once (S3:PutObject events) to a bucket. I'm speculating, I have no knowledge of the bucket configurations (and I'm not an expert with S3 ACLs). This just seems like a logical possibility to replicate the odd behavior of "works the first time but not the second". A simple way of checking this would be to either run the exact same migration locally or to trigger a regenerate derivative event in production and see if it fails. If the migration fails locally in the same manner then I'd say it's safe to say the S3 permissions are not the issue. But if it doesn't and triggers a regenerate derivative event ends in a failure in production it's likely to be worth investigating. This is what I thought Bethany had alluded to in her last comment. |
Some random info that may or may not be relevant (librarian, not tech person here so please ignore me if this is all nonsense).
|
This may seem off-topic, but we could avoid the naming collision issue by using unique values as the media's filename. In theory, the original file's hash should not be affected by renaming it. Running a script locally like this could copy the files to a new directory, name them to their hash value, log the original names and the new ones, and output when there's an error. destination='/processed_images'
echo "" > $destination/log
for file in *.{jpg,jpeg,png,tif,tiff,jp2}
do
sum=`sha256sum "$file"`
sum="${sum% $file}"
cp "$file" "$destination/$sum"
echo "$file $destination/$sum" >> $destination/log
[ "$(<$file sha256sum)" = "$(<sha256sum $destination/$sum sha256sum)" ] || echo "Problem with $destination/$sum"
done This should safeguard the filename collision issue and make identifying duplicates simple. This could always be offloaded to a module instead, something like filehash. |
Unfortunately, the filenames are important for librarians to manage files and keep them associated with the right items, so we can't really change them without stakeholder approval. |
@mjanowiecki This is the case once in Islandora? Or are we talking about offline (preprocessing/reprocessing)? |
@DonRichards |
@DonRichards and @mjanowiecki - please move work and discussion over to Jira. This issue is now at https://jhulibraries.atlassian.net/browse/LAGS-172 |
This started by seeing errors in the Drupal log for drupal generating it's thumbnails - but this seems to happen in the cloud as well when the external services are running derivatives.
A big distinction to catch here is that drupal makes thumbnails for it's admin facing pages (specifically the media page). Those thumbnails are different from the thumbnails we make in the houdini container - those are user facing thumbnails.
If the drupal admin facing ones fail, it's not a big deal, but it looks like houdini is affected by this issue as well. Just something to keep in mind as you read through below.
Note: I'm seeing this message on the cloud server, but not in my dev environment, so I wonder if it's an AWS permission error.
On the test cloud server: upon going to the Media page, I started seeing these errors in the log:
Unable to generate the derived image located at private://styles/thumbnail/private/2022-01/3061-Service File.jpg.
These are Drupal thumbnails, which are distinct from our derivative thumbnails. Drupal wants to create a thumbnail simply to display the image on the Media List page (to the logged in with rights to use the admin interface). An example:
These are created in the Drupal container by imagemagick (it does not use the deriv containers). I'm wonder if there's an access error here where Drupal can't get or retrieve the file from AWS? Or maybe the file can't be saved in AWS once created? Not exactly sure what's going on here.
In my local setup, Drupal creates a
styles
folder in minio for Drupal's thumbnails, like so:Perhaps that's not successfully happening on AWS?
The text was updated successfully, but these errors were encountered: