GitHub - harvardnlp/image-extraction: Extract images from PDFs

Code for extracting a representative image from a PDF file using CV.

This code needs to be run on GPU. We include a colab example.

Setup

bash setup.sh

First add a bunch of PDF files to a directory pdfs/.

Next call,

python run.py pdfs/ pics/

The code will attempt to extract an image for each pdf into the pic directory.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
ImageCheck.ipynb		ImageCheck.ipynb
LICENSE		LICENSE
README.md		README.md
infer_simple.py		infer_simple.py
run.py		run.py
setup.sh		setup.sh