A python script that recursively extracts pdf metadata from any pdf in a subfolder.
It also extracts thumbnails from each file and stores them in an images folder
- Download and install Ghostcript
This will extract the pdf metadata - Download and install ImageMagick
This will create the pdf thumbnails
Run pip install
to install project dependencies (pypdf, wand, argparse).
or if you're on Arch there are AUR packages for each one of these.
-d
or--directory
will specify the directory to be scanned
Example: python .\pdf_metadata_extractor.py -d pdf_folder