Skip to content

Latest commit

 

History

History
34 lines (23 loc) · 1.22 KB

README.md

File metadata and controls

34 lines (23 loc) · 1.22 KB

Extracting illustrations from ALTO files with IIIF

Synopsis

Extracting illustrations described in OCRed documents (ALTO format) with IIIF API.

Full presentation in French

Installation

You will need 4 scripts :

  1. filterIMG.sh (shell)
  2. processURLs.pl (Perl)
  3. extractIMG.pl (Perl)
  4. extractMD.pl (Perl)

A batch.sh script chains the commands.

The documents must be stored in a "DOCS" folder. The images will be generated in a "IMG" folder. The metadata will be generated in a "MD" folder.

Tests

  1. Open a command line terminal.
  2. filterIMG.sh

  3. perl processURLs.pl illustrations.txt

  4. perl extractIMG.pl illustrations.txt_URL 200 -- minimal size in Ko of the extracted images

  5. perl extractMD.pl illustrations.txt_URL

License

CC0

CC0