Have .read_pdf() show us which page it is processing for large PDF files. #507

bulrush15 · 2024-09-19T13:22:18Z

I may have PDF files of 400+ pages or more, each page with a table. We could use an option in .read_pdf() where Camelot tells us which page it is starting to process, or it has processed.

Alternatively, how can we make a loop to process one page at a time where I can print my own message to show which page is being processed.

The text was updated successfully, but these errors were encountered:

bosd · 2024-09-19T14:04:50Z

Hey!

As #343, we try to build a maintained fork at pypdf_table_extraction.

This specific feature is not implemented.
But there is support for parallel processing to speedup the process for large files. Which you may find usefull.

bulrush15 · 2024-09-20T09:37:47Z

Thank you @bosd. But we may end up processing many large files so in my status message I would still want to show the file I'm processing and the page that is being processed.

I may be able to process multiple pages in a loop like this:

# From Gemini AI. 
import camelot
import pandas as pd

# Replace 'your_pdf_file.pdf' with the actual path to your PDF file
pdf_file = 'your_pdf_file.pdf'

# Extract tables from the PDF file
tables = camelot.read_pdf(pdf_file)

# Iterate through the extracted tables
for table in tables:
    # Convert the table to a pandas DataFrame
    df = table.df

    # Save the DataFrame as a UTF-8 CSV file
    csv_file = 'output.csv'
    df.to_csv(csv_file, index=False, encoding='utf-8')

    print(f"Table {table.index} saved as {csv_file}")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have .read_pdf() show us which page it is processing for large PDF files. #507

Have .read_pdf() show us which page it is processing for large PDF files. #507

bulrush15 commented Sep 19, 2024

bosd commented Sep 19, 2024

bulrush15 commented Sep 20, 2024 •

edited

Loading

Have .read_pdf() show us which page it is processing for large PDF files. #507

Have .read_pdf() show us which page it is processing for large PDF files. #507

Comments

bulrush15 commented Sep 19, 2024

bosd commented Sep 19, 2024

bulrush15 commented Sep 20, 2024 • edited Loading

bulrush15 commented Sep 20, 2024 •

edited

Loading