Extracting illustrations described in OCRed documents (ALTO format) with IIIF API.
You will need 4 scripts :
- filterIMG.sh (shell)
- processURLs.pl (Perl)
- extractIMG.pl (Perl)
- extractMD.pl (Perl)
A batch.sh script chains the commands.
The documents must be stored in a "DOCS" folder. The images will be generated in a "IMG" folder. The metadata will be generated in a "MD" folder.
- Open a command line terminal.
-
filterIMG.sh
-
perl processURLs.pl illustrations.txt
-
perl extractIMG.pl illustrations.txt_URL 200 -- minimal size in Ko of the extracted images
-
perl extractMD.pl illustrations.txt_URL
CC0