This Python program automates the process of searching for keywords in a PDF document, highlighting the found instances, and adding comments to them. It then generates a summary of the occurrences for each keyword.
- Search for multiple keywords in a PDF document.
- Highlight found instances in the document.
- Add comments to the highlighted text.
- Customize the color of the highlight based on the keyword.
- Generate a summary of the keyword occurrences in the document.
- PyMuPDF (fitz): for PDF processing
- Python's built-in
csv
andsys
modules
- Prepare a CSV file with the format: 'keyword', 'comment', 'color'. This will be used as the keyword list. If you're using Japanese text, make sure to save the CSV in Unicode (UTF-8) format to preserve the Japanese characters.
- Set up your configuration file (
config.py
) with the following parameters:"source file"
: The path to your input PDF file."keywords list"
: The path to your keyword list CSV file.
- Run the script:
python <script_name.py>
This is the main function of the script. It reads the CSV file with the keyword list, opens the PDF file, searches for the keywords in the document, highlights them, adds the comments, and then saves the modified PDF file. It also creates a summary of the occurrences of each keyword in the document.
This function reads the CSV file containing the keyword list and returns it as a list.
This function adds highlights and comments to the matched keywords in the PDF document.
This function generates a summary of the occurrences of each keyword in the PDF document and writes it to a text file.
The output of the script is a modified version of the input PDF file with the added highlights and comments, as well as a text file containing the summary of the keyword occurrences.
This project was developed using the following resources: