Restore and conservate great pieces of literature
Google books contains some books that are no longer available anywhere else. Though it might be great literature that had already been popular decades ago.
Unfortunately the used aproach to detect written text on the scanned books seems to have been totally unaware of old german fractured fonts. Therefore available textual representations do not contain much more than a weird mix of letters enriched with unrecognizeable characters.
The book "Peterli am Lift" from "Niklaus Bolt" has been scanned by google. While the scanned PDF is of good quality, the generated textual representations, such as the EPUB, are poor.
More ?
Search
Here's a script that hoists the treasure confined within this digital PDF images.
The script relies on popular PDF tooling (poppler) and an OCR scanner (tesseract) with its extensions for german fractures. This set of free available software is able to bring the classic to life.
Scripts were crafted and designed to run on a debian based operating system such as Ubuntu (tested with Linux Mint 18).
wget https://archive.org/download/peterliamlift00boltgoog/peterliamlift00boltgoog.pdf
./revealLetters.sh peterliamlift00boltgoog.pdf