Skip to content

Commit

Permalink
mrc: deal with indexed images
Browse files Browse the repository at this point in the history
If the input image has mode 'P' or something else that's pretty funky,
the code would currently error out. Let's just convert any image format
that is not "standard" (for some definition of standard) to RGB.
  • Loading branch information
MerlijnWajer committed May 9, 2024
1 parent e29960b commit 654892e
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions internetarchivepdf/mrc.py
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,11 @@ def create_mrc_hocr_components(image, hocr_word_data,

yield mask_arr

if image.mode not in ('L', 'RGB'):
# Special modes like mapped ('P') or other modes we just map to RGB for
# simplicity sake
image = image.convert('RGB')

image_arr = np.array(image)

t = time()
Expand Down

0 comments on commit 654892e

Please sign in to comment.