Skip to content

EdgarFx/QuickTable

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

QuickTable

In this work, we proposes a new pipeline to extract tables of interest in PDF files, and develops an ultra lightweight application named QuickTable accordingly.

Most of the previous research only focused on one or two tasks of table recognition and there is little research on finding tables of interest. The developed QuickTable uses the proposed pipeline based on PP-Picodet, SLANet, PPOCRv3, Text Segmentation and Cosine Similarity Analysis, which allows users to upload PDF files from mobile devices and enter keywords to get tables of interest. In addition, we have trained models in both Chinese and English so that users can upload files in different languages. Experiments show that the proposed pipeline is lightweight and outperforms previous approaches, demonstrating the effectiveness of our method.

In summary, our main contributions:

  • Proposed a new pipeline to extract tables of interest in PDF files, which allows users to upload PDF files from mobile devices and enter keywords to get tables of interest.
  • Trained models in both Chinese and English so that users can upload files in different languages.
  • Developd an ultra lightweight application.

Proposed Pipeline

pipeline

Usage

For quick demo:

python quicktable.py

The directory QuickTable_with_server is the application that combined with the django back-end codes. If you don't want to deploy the software environment, you can just run the script quicktable.py in the directory QuickTable, which is a quickstart demo of our algorithm.

Example Results

output

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages