pimmer

Exploratory code for PDF image mining. A multi page PDF will be split and converted to jpeg files that are mined for illustrations and images. Baed on https://github.com/megloff1/image-mining with added PDF splitting, a simple GUI and queue management.

Install

Make sure you have Git and Docker with docker-compose installed.
Get the latest version of this repository: git clone --depth 1 https://github.com/peterk/pimmer.git.
Copy the example_env file to .env and edit settings.
Make sure you have a folder called data in the project root folder (jobs and resulting image files will end up here). You can map output to a different local folder for the worker in docker-compose.yml.
Run docker-compose up -d. Wait a minute until the queue and worker is up.

The service is now running on http://localhost:7777.

If you are planning on processing a large number of documents you can start more workers with docker-compose up -d --scale worker=5 and then post files with curl to the /process/ endpoint:

curl -v --silent -F "file=@testdata/hat_catalog.pdf" http://0.0.0.0:7777/process/

Please report bugs and feedback in the Github issue tracker.

Results

The detected images will end up as individual image files in job folders in the ./data/results.

The job folder will also contain a json file per page with the coordinates of the detected images.

A digitized hat catalog like this:

... results in all the individual hat images:

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
testdata		testdata
web		web
worker		worker
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
env_example		env_example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pimmer

Install

Results

About

Releases

Packages

Languages

License

peterk/pimmer

Folders and files

Latest commit

History

Repository files navigation

pimmer

Install

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages