PDF Commenting and Highlighting Tool

This Python program automates the process of searching for keywords in a PDF document, highlighting the found instances, and adding comments to them. It then generates a summary of the occurrences for each keyword.

Features

Search for multiple keywords in a PDF document.
Highlight found instances in the document.
Add comments to the highlighted text.
Customize the color of the highlight based on the keyword.
Generate a summary of the keyword occurrences in the document.

Dependencies

PyMuPDF (fitz): for PDF processing
Python's built-in csv and sys modules

Usage

Prepare a CSV file with the format: 'keyword', 'comment', 'color'. This will be used as the keyword list. If you're using Japanese text, make sure to save the CSV in Unicode (UTF-8) format to preserve the Japanese characters.
Set up your configuration file (config.py) with the following parameters:
- "source file": The path to your input PDF file.
- "keywords list": The path to your keyword list CSV file.
Run the script: python <script_name.py>

Functions

comment_pdf

This is the main function of the script. It reads the CSV file with the keyword list, opens the PDF file, searches for the keywords in the document, highlights them, adds the comments, and then saves the modified PDF file. It also creates a summary of the occurrences of each keyword in the document.

read_csv

This function reads the CSV file containing the keyword list and returns it as a list.

highlight_text

This function adds highlights and comments to the matched keywords in the PDF document.

create_summary

This function generates a summary of the occurrences of each keyword in the PDF document and writes it to a text file.

Output

The output of the script is a modified version of the input PDF file with the added highlights and comments, as well as a text file containing the summary of the keyword occurrences.

Resources

This project was developed using the following resources:

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pdf_highlighter.ipynb		pdf_highlighter.ipynb
preprocess_csv.py		preprocess_csv.py
read_db.py		read_db.py
single_word_scan.py		single_word_scan.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Commenting and Highlighting Tool

Features

Dependencies

Usage

Functions

comment_pdf

read_csv

highlight_text

create_summary

Output

Resources

About

Languages

dariru3/py-pdf_highlight_comment

Folders and files

Latest commit

History

Repository files navigation

PDF Commenting and Highlighting Tool

Features

Dependencies

Usage

Functions

comment_pdf

read_csv

highlight_text

create_summary

Output

Resources

About

Topics

Resources

Stars

Watchers

Forks

Languages