Skip to content

Extracting illustrations from ALTO documents with IIIF

Notifications You must be signed in to change notification settings

altomator/ALTO-IIIF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extracting illustrations from ALTO files with IIIF

Synopsis

Extracting illustrations described in OCRed documents (ALTO format) with IIIF API.

Full presentation in French

Installation

You will need 4 scripts :

  1. filterIMG.sh (shell)
  2. processURLs.pl (Perl)
  3. extractIMG.pl (Perl)
  4. extractMD.pl (Perl)

A batch.sh script chains the commands.

The documents must be stored in a "DOCS" folder. The images will be generated in a "IMG" folder. The metadata will be generated in a "MD" folder.

Tests

  1. Open a command line terminal.
  2. filterIMG.sh

  3. perl processURLs.pl illustrations.txt

  4. perl extractIMG.pl illustrations.txt_URL 200 -- minimal size in Ko of the extracted images

  5. perl extractMD.pl illustrations.txt_URL

License

CC0

CC0

About

Extracting illustrations from ALTO documents with IIIF

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published