Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMP][UPD] Documentation Big Bang #88

Merged
merged 15 commits into from
Aug 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
- { python: "3.10", os: "macos-latest", session: "tests" }
- { python: "3.10", os: "ubuntu-latest", session: "typeguard" }
- { python: "3.10", os: "ubuntu-latest", session: "xdoctest" }
- { python: "3.8", os: "ubuntu-latest", session: "docs-build" }
- { python: "3.10", os: "ubuntu-latest", session: "docs-build" }

env:
NOXSESSION: ${{ matrix.session }}
Expand Down
117 changes: 0 additions & 117 deletions CONTRIBUTING.md

This file was deleted.

3 changes: 2 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
MIT License

Copyright (c) 2019-2021 Camelot Developers
Copyright (c) 2024 pypdf_table_extraction Developers
Copyright (c) 2019-2023 Camelot Developers
Copyright (c) 2018-2019 Peeply Private Ltd (Singapore)

Permission is hereby granted, free of charge, to any person obtaining a copy
Expand Down
56 changes: 25 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,17 @@
<img src="https://github.com/py-pdf/pypdf_table_extraction/blob/main/docs/_static/pypdf-table-extraction.png" width="200">
</p>

# Camelot: PDF Table Extraction for Humans
# pypdf_table_extraction (Camelot): PDF Table Extraction for Humans

[![tests](https://github.com/camelot-dev/camelot/actions/workflows/tests.yml/badge.svg)](https://github.com/camelot-dev/camelot/actions/workflows/tests.yml) [![Documentation Status](https://readthedocs.org/projects/camelot-py/badge/?version=master)](https://camelot-py.readthedocs.io/en/master/)
[![codecov.io](https://codecov.io/github/camelot-dev/camelot/badge.svg?branch=master&service=github)](https://codecov.io/github/camelot-dev/camelot?branch=master)
[![image](https://img.shields.io/pypi/v/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![image](https://img.shields.io/pypi/l/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![image](https://img.shields.io/pypi/pyversions/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![Gitter chat](https://badges.gitter.im/camelot-dev/Lobby.png)](https://gitter.im/camelot-dev/Lobby)
[![image](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)
[![tests](https://github.com/py-pdf/pypdf_table_extraction/actions/workflows/tests.yml/badge.svg)](https://github.com/py-pdf/pypdf_table_extraction/actions/workflows/tests.yml) [![Documentation Status](https://readthedocs.org/projects/pypdf-table-extraction/badge/?version=latest)](https://pypdf-table-extraction.readthedocs.io/en/latest/)
[![codecov.io](https://codecov.io/github/py-pdf/pypdf_table_extraction/badge.svg?branch=main&service=github)](https://codecov.io/github/py-pdf/pypdf_table_extraction/?branch=main)
[![image](https://img.shields.io/pypi/v/pypdf-table-extraction.svg)](https://pypi.org/project/pypdf-table-extraction/) [![image](https://img.shields.io/pypi/l/pypdf-table-extraction.svg)](https://pypi.org/project/pypdf-table-extraction/) [![image](https://img.shields.io/pypi/pyversions/pypdf-table-extraction.svg)](https://pypi.org/project/pypdf-table-extraction/)

**Camelot** is a Python library that can help you extract tables from PDFs!

**Note:** You can also check out [Excalibur](https://github.com/camelot-dev/excalibur), the web interface to Camelot!
**pypdf_table_extraction** Formerly known as [Camelot](https://github.com/camelot-dev/camelot) is a Python library that can help you extract tables from PDFs!

---

**Here's how you can extract tables from PDFs.** You can check out the PDF used in this example [here](https://github.com/camelot-dev/camelot/blob/master/docs/_static/pdf/foo.pdf).
**Here's how you can extract tables from PDFs.** You can check out the PDF used in this example [here](https://github.com/py-pdf/pypdf_table_extraction/blob/main/docs/_static/pdf/foo.pdf).

```python3
>>> import camelot
Expand Down Expand Up @@ -45,53 +42,50 @@
| 2032_2 | 0.17 | 57.8 | 21.7% | 0.3% | 2.7% | 1.2% |
| 4171_1 | 0.07 | 173.9 | 58.1% | 1.6% | 2.1% | 0.5% |

Camelot also comes packaged with a [command-line interface](https://camelot-py.readthedocs.io/en/master/user/cli.html)!
pypdf_table_extraction also comes packaged with a [command-line interface](https://pypdf-table-extraction.readthedocs.io/en/latest/user/cli.html)!

Refer to the [QuickStart Guide](https://github.com/py-pdf/pypdf_table_extraction/blob/main/docs/user/quickstart.rst#quickstart) to quickly get started with Camelot, extract tables from PDFs and explore some basic options.
Refer to the [QuickStart Guide](https://github.com/py-pdf/pypdf_table_extraction/blob/main/docs/user/quickstart.rst#quickstart) to quickly get started with pypdf_table_extraction, extract tables from PDFs and explore some basic options.

**Note:** Camelot only works with text-based PDFs and not scanned documents. (As Tabula [explains](https://github.com/tabulapdf/tabula#why-tabula), "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)
**Note:** pypdf_table_extraction only works with text-based PDFs and not scanned documents. (As Tabula [explains](https://github.com/tabulapdf/tabula#why-tabula), "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)

You can check out some frequently asked questions [here](https://camelot-py.readthedocs.io/en/master/user/faq.html).
You can check out some frequently asked questions [here](https://pypdf-table-extraction.readthedocs.io/en/latest/user/faq.html).

## Why Camelot?
## Why pypdf_table_extraction?

- **Configurability**: Camelot gives you control over the table extraction process with [tweakable settings](https://camelot-py.readthedocs.io/en/master/user/advanced.html).
- **Configurability**: pypdf_table_extraction gives you control over the table extraction process with [tweakable settings](https://pypdf-table-extraction.readthedocs.io/en/latest/user/advanced.html).
- **Metrics**: You can discard bad tables based on metrics like accuracy and whitespace, without having to manually look at each table.
- **Output**: Each table is extracted into a **pandas DataFrame**, which seamlessly integrates into [ETL and data analysis workflows](https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873). You can also export tables to multiple formats, which include CSV, JSON, Excel, HTML, Markdown, and Sqlite.

See [comparison with similar libraries and tools](https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).

## Support the development

If Camelot has helped you, please consider supporting its development with a one-time or monthly donation [on OpenCollective](https://opencollective.com/camelot).
See [comparison with similar libraries and tools](https://github.com/py-pdf/pypdf_table_extraction/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).

## Installation

### Using conda

The easiest way to install Camelot is with [conda](https://conda.io/docs/), which is a package manager and environment management system for the [Anaconda](http://docs.continuum.io/anaconda/) distribution.
The easiest way to install pypdf_table_extraction is with [conda](https://conda.io/docs/), which is a package manager and environment management system for the [Anaconda](http://docs.continuum.io/anaconda/) distribution.


```bash
conda install -c conda-forge camelot-py
conda install -c conda-forge pypdf-table-extraction
```

### Using pip

After [installing the dependencies](https://camelot-py.readthedocs.io/en/master/user/install-deps.html) ([tk](https://packages.ubuntu.com/bionic/python/python-tk) and [ghostscript](https://www.ghostscript.com/)), you can also just use pip to install Camelot:
After [installing the dependencies](https://pypdf-table-extraction.readthedocs.io/en/latest/user/install-deps.html) ([tk](https://packages.ubuntu.com/bionic/python/python-tk) and [ghostscript](https://www.ghostscript.com/)), you can also just use pip to install pypdf_table_extraction:

```bash
pip install camelot-py[base]
pip install pypdf-table-extraction[base]
```

### From the source code

After [installing the dependencies](https://camelot-py.readthedocs.io/en/master/user/install.html#using-pip), clone the repo using:
After [installing the dependencies](https://pypdf-table-extraction.readthedocs.io/en/latest/user/install.html#using-pip), clone the repo using:

```bash
git clone https://www.github.com/camelot-dev/camelot
git clone https://github.com/py-pdf/pypdf_table_extraction.git
```

and install Camelot using pip:
and install using pip:

```
cd camelot
Expand All @@ -100,7 +94,7 @@ pip install ".[base]"

## Documentation

The documentation is available at [http://camelot-py.readthedocs.io/](http://camelot-py.readthedocs.io/).
The documentation is available at [http://pypdf-table-extraction.readthedocs.io/](http://pypdf-table-extraction.readthedocs.io/).

## Wrappers

Expand All @@ -112,12 +106,12 @@ The documentation is available at [http://camelot-py.readthedocs.io/](http://cam

## Contributing

The [Contributor's Guide](https://camelot-py.readthedocs.io/en/master/dev/contributing.html) has detailed information about contributing issues, documentation, code, and tests.
The [Contributor's Guide](https://pypdf-table-extraction.readthedocs.io/en/latest/dev/contributing.html) has detailed information about contributing issues, documentation, code, and tests.

## Versioning

Camelot uses [Semantic Versioning](https://semver.org/). For the available versions, see the tags on this repository. For the changelog, you can check out [HISTORY.md](https://github.com/camelot-dev/camelot/blob/master/HISTORY.md).
pypdf_table_extraction uses [Semantic Versioning](https://semver.org/). For the available versions, see the tags on this repository. For the changelog, you can check out the [releases](https://github.com/py-pdf/pypdf_table_extraction/releases) page.

## License

This project is licensed under the MIT License, see the [LICENSE](https://github.com/camelot-dev/camelot/blob/master/LICENSE) file for details.
This project is licensed under the MIT License, see the [LICENSE](https://github.com/py-pdf/pypdf_table_extraction/blob/main/LICENSE) file for details.
30 changes: 0 additions & 30 deletions docs/_templates/hacks.html

This file was deleted.

24 changes: 0 additions & 24 deletions docs/_templates/sidebarintro.html

This file was deleted.

15 changes: 0 additions & 15 deletions docs/_templates/sidebarlogo.html

This file was deleted.

Loading
Loading