Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image Viewer does not display all of the pages in a Paged Content item #485

Open
htpvu opened this issue Feb 22, 2022 · 6 comments
Open

Comments

@htpvu
Copy link

htpvu commented Feb 22, 2022

this item (https://test.digital.library.jhu.edu/node/13155/) has 199 members, but only the first 118 is displayed. But if I look through the child member page, all of the items are there: https://test.digital.library.jhu.edu/node/13155/members?page=11

@jabrah jabrah self-assigned this Mar 7, 2022
@jabrah
Copy link

jabrah commented Mar 9, 2022

There are two "sets" of items in this Paged Content node. There's a set of PDFs (all of the v1-pNNNN pages )and a set of JPEG images (the v2-pNNNN pages).

Spot checking a few of either set showed that the v2 set has no Service File, but does have derivatives.

v2 set

  • Original File (a JPEG image)
  • Intermediate File (a TIF?)
  • Extracted Text

v1 set

  • Original File X2 - look like the file is uploaded and attached twice?
  • Service File
  • Thumbnail
  • Extracted Text

Perhaps image derivatives never ran for the v2 pages, so no Service File derivatives were generated. Perhaps this may cause these pages to be skipped in the IIIF representation -- I'd have to investigate this a bit to verify.

@jabrah
Copy link

jabrah commented Mar 9, 2022

Views/IIIF Manifest
image

This Drupal view sets up the data that is ultimately transformed into our IIIF manifests. It is configured to only use Service File media, so the v2 pages, described above, would be filtered out of the IIIF data.

We need to investigate whether this is caused by a but in the derivative generation (or associated configuration) or if there's something in the migration spreadsheets causing this anomalous behavior.

Either way, this finding means this ticket will take longer than initially hoped.

@jabrah jabrah added the vendor label Mar 9, 2022
@jabrah jabrah removed their assignment Mar 9, 2022
@htpvu htpvu removed the UI label Mar 18, 2022
@jhu-alistair
Copy link

Work with Michelle J on this issue.

@htpvu
Copy link
Author

htpvu commented Mar 23, 2022

related ticket about issue when trying to re-ingest files Re-ingesting files seems to cause access errors #475

There is probably a way to re-start the derivative generation from admin ui.

@jabrah
Copy link

jabrah commented Mar 25, 2022

@mjanowiecki Identified that there are cases where we ingest JPEG images as Original Files and the Service Files generate as expected.

Examples:

Both of these are Pages from the Paged Content node linked above (https://test.digital.library.jhu.edu/node/13155/)

Notes from IDC documentation re: Media Use

  • Extracted Text: A textual representation of the Object appropriate for fulltext indexing, such as a plaintext version of a document, or OCR text.
    • Select for extracted text files.
    • If selected, file is downloadable but is not displayed to user, as this media use type does NOT generate Service File or Thumbnail Image derivatives to display.
  • Intermediate File: High quality representation of the Object, appropriate for generating derivatives or other additional processing.
    • Select for additional files on a repository item that already contains an Original File.
    • If selected, file is downloadable but is not displayed to user, as this media use type does NOT generate Service File or Thumbnail Image derivatives to display.
  • Original File: The original creation format of a file.
    • Select for original files that should be displayed to user.
      • For any file tagged as an Original File, the system automatically generates a Service File derivative and thumbnail to display to user (unless a Service File already exists for the object).
      • Other Media Use terms do NOT generate a Service File or Thumbnail Image.
    • Important note: There can only be one original file media per repository item.
  • Service File: A medium quality representation of the Object appropriate for serving to users. Similar to a FADGI "derivative file" but can also be used for born-digital content, and is not necessarily derived from another file.
    • Select if files are already in a state where they can be directly served to the public and processing time needs reduction. It's acceptable to dual-label a file as "Original file" and "Service File", and this will disable service file derivative generation, since a Service File already exists.
  • Thumbnail Image: A low resolution image representation of the Object appropriate for using as an icon.
    • Do not select unless instructed as thumbnails will be generated from the Original File associated with the repository item.
  • Transcript: A textual representation of the Object appropriate for presenting to users, such as subtitles or transcript of a video. Can be used as a substitute or complement to other files for accessibility purposes.
    • Select for transcript files if the repository item already has an Original File.
    • If selected, file is downloadable but is not displayed to user, as this media use type does NOT generate Service File or Thumbnail Image derivatives to display.

@DonRichards
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants