Skip to content

Commit

Permalink
Update Documentation to reflect new project/repo
Browse files Browse the repository at this point in the history
  • Loading branch information
bosd committed Aug 9, 2024
1 parent d6423a1 commit 5205e91
Show file tree
Hide file tree
Showing 11 changed files with 72 additions and 85 deletions.
6 changes: 3 additions & 3 deletions docs/_templates/sidebarintro.html
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@

<h3>Useful Links</h3>
<ul>
<li><a href="https://github.com/camelot-dev/camelot">Camelot @ GitHub</a></li>
<li><a href="https://pypi.org/project/camelot-py/">Camelot @ PyPI</a></li>
<li><a href="https://github.com/py-pdf/pypdf_table_extraction/">pypdf-table-extraction @ GitHub</a></li>
<li><a href="https://pypi.org/project/pypdf-table-extraction/">pypdf-table-extraction @ PyPI</a></li>
<li>
<a href="https://github.com/camelot-dev/camelot/issues">Issue Tracker</a>
<a href="https://github.com/py-pdf/pypdf_table_extraction/issues">Issue Tracker</a>
</li>
</ul>
22 changes: 11 additions & 11 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,8 @@
master_doc = "index"

# General information about the project.
project = "Camelot"
copyright = "2021, Camelot Developers"
project = "pypdf-table-extraction"
copyright = "2024, pypdf-table-extraction Developers"
author = "Vinayak Mehta"

# The version info for the project you're documenting, acts as replacement for
Expand Down Expand Up @@ -139,8 +139,8 @@
# documentation.
html_theme_options = {
"show_powered_by": False,
"github_user": "camelot-dev",
"github_repo": "camelot",
"github_user": "py-pdf",
"github_repo": "pypdf-table-extraction",
"github_banner": True,
"show_related": False,
"note_bg": "#FFF59C",
Expand Down Expand Up @@ -262,7 +262,7 @@
# html_search_scorer = 'scorer.js'

# Output file base name for HTML help builder.
htmlhelp_basename = "Camelotdoc"
htmlhelp_basename = "pypdf-table-extraction-doc"

# -- Options for LaTeX output ---------------------------------------------

Expand All @@ -285,7 +285,7 @@
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, "Camelot.tex", "Camelot Documentation", "Vinayak Mehta", "manual"),
(master_doc, "pypdf-table-extraction.tex", "pypdf-table-extraction Documentation", "Vinayak Mehta", "manual"),
]

# The name of an image file (relative to this directory) to place at the top of
Expand Down Expand Up @@ -325,7 +325,7 @@

# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [(master_doc, "Camelot", "Camelot Documentation", [author], 1)]
man_pages = [(master_doc, "pypdf-table-extraction", "pypdf-table-extraction Documentation", [author], 1)]

# If true, show URL addresses after external links.
#
Expand All @@ -340,11 +340,11 @@
texinfo_documents = [
(
master_doc,
"Camelot",
"Camelot Documentation",
"pypdf-table-extraction",
"pypdf-table-extraction Documentation",
author,
"Camelot",
"One line description of project.",
"pypdf-table-extraction",
"PDF Table Extraction for Humans.",
"Miscellaneous",
),
]
Expand Down
24 changes: 12 additions & 12 deletions docs/dev/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Contributor's Guide
===================

If you're reading this, you're probably looking to contributing to Camelot. *Time is the only real currency*, and the fact that you're considering spending some here is *very* generous of you. Thank you very much!
If you're reading this, you're probably looking to contributing to pypdf-table-extraction. *Time is the only real currency*, and the fact that you're considering spending some here is *very* generous of you. Thank you very much!

This document will help you get started with contributing documentation, code, testing and filing issues. If you have any questions, feel free to reach out to `Vinayak Mehta`_, the author and maintainer.

Expand All @@ -27,17 +27,17 @@ As the `Requests Code Of Conduct`_ states, **all contributions are welcome**, as
Your first contribution
-----------------------

A great way to start contributing to Camelot is to pick an issue tagged with the `help wanted`_ or the `good first issue`_ tags. If you're unable to find a good first issue, feel free to contact the maintainer.
A great way to start contributing to pypdf-table-extraction is to pick an issue tagged with the `help wanted`_ or the `good first issue`_ tags. If you're unable to find a good first issue, feel free to contact the maintainer.

.. _help wanted: https://github.com/camelot-dev/camelot/labels/help%20wanted
.. _good first issue: https://github.com/camelot-dev/camelot/labels/good%20first%20issue
.. _help wanted: https://github.com/py-pdf/pypdf_table_extraction/labels/help%20wanted
.. _good first issue: https://github.com/py-pdf/pypdf_table_extraction/labels/good%20first%20issue

Setting up a development environment
------------------------------------

To install the dependencies needed for development, you can use pip::

$ pip install "camelot-py[dev]"
$ pip install "pypdf-table-extraction[dev]"

Alternatively, you can clone the project repository, and install using pip::

Expand All @@ -51,13 +51,13 @@ Submit a pull request

The preferred workflow for contributing to Camelot is to fork the `project repository`_ on GitHub, clone, develop on a branch and then finally submit a pull request. Here are the steps:

.. _project repository: https://github.com/camelot-dev/camelot
.. _project repository: https://github.com/py-pdf/pypdf_table_extraction/

1. Fork the project repository. Click on the ‘Fork’ button near the top of the page. This creates a copy of the code under your account on the GitHub.

2. Clone your fork of Camelot from your GitHub account::

$ git clone https://www.github.com/[username]/camelot
$ git clone https://www.github.com/[username]/pypdf-table-extraction

3. Create a branch to hold your changes::

Expand All @@ -76,7 +76,7 @@ Always branch out from ``master`` to work on your contribution. It's good practi

$ git push -u origin my-feature

Now it's time to go to the your fork of Camelot and create a pull request! You can `follow these instructions`_ to do the same.
Now it's time to go to the your fork of pypdf-table-extraction and create a pull request! You can `follow these instructions`_ to do the same.

.. _follow these instructions: https://help.github.com/articles/creating-a-pull-request-from-a-fork/

Expand All @@ -89,7 +89,7 @@ We recommend that your pull request complies with the following guidelines:

.. _pep8: http://pep8.org

- In case your pull request contains function docstrings, make sure you follow the `numpydoc`_ format. All function docstrings in Camelot follow this format. Following the format will make sure that the API documentation is generated flawlessly.
- In case your pull request contains function docstrings, make sure you follow the `numpydoc`_ format. All function docstrings in pypdf-table-extraction follow this format. Following the format will make sure that the API documentation is generated flawlessly.

.. _numpydoc: https://numpydoc.readthedocs.io/en/latest/format.html

Expand All @@ -108,7 +108,7 @@ We recommend that your pull request complies with the following guidelines:

.. _task list: https://blog.github.com/2013-01-09-task-lists-in-gfm-issues-pulls-comments/

- If contributing new functionality, make sure that you add a unit test for it, while making sure that all previous tests pass. Camelot uses `pytest`_ for testing. Tests can be run using:
- If contributing new functionality, make sure that you add a unit test for it, while making sure that all previous tests pass. pypdf-table-extraction uses `pytest`_ for testing. Tests can be run using:

.. _pytest: https://docs.pytest.org/en/latest/

Expand All @@ -134,12 +134,12 @@ Filing Issues

We use `GitHub issues`_ to keep track of all issues and pull requests. Before opening an issue (which asks a question or reports a bug), please use GitHub search to look for existing issues (both open and closed) that may be similar.

.. _GitHub issues: https://github.com/camelot-dev/camelot/issues
.. _GitHub issues: https://github.com/py-pdf/pypdf_table_extraction/issues

Questions
^^^^^^^^^

Please don't use GitHub issues for support questions. A better place for them would be `Stack Overflow`_. Make sure you tag them using the ``python-camelot`` tag.
Please don't use GitHub issues for support questions. A better place for them would be `Stack Overflow`_. Make sure you tag them using the ``pypdf-table-extraction`` tag.

.. _Stack Overflow: http://stackoverflow.com

Expand Down
40 changes: 13 additions & 27 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Camelot: PDF Table Extraction for Humans
pypdf-table-extraction (Camelot): PDF Table Extraction for Humans
========================================

Release v\ |version|. (:ref:`Installation <install>`)
Expand All @@ -15,30 +15,22 @@ Release v\ |version|. (:ref:`Installation <install>`)
:target: https://camelot-py.readthedocs.io/en/master/
:alt: Documentation Status

.. image:: https://codecov.io/github/camelot-dev/camelot/badge.svg?branch=master&service=github
:target: https://codecov.io/github/camelot-dev/camelot?branch=master
.. image:: https://codecov.io/github/py-pdf/pypdf_table_extraction/badge.svg?branch=master&service=github
:target: https://codecov.io/github/py-pdf/pypdf_table_extraction/?branch=master

.. image:: https://img.shields.io/pypi/v/camelot-py.svg
:target: https://pypi.org/project/camelot-py/
.. image:: https://img.shields.io/pypi/v/pypdf-table-extraction.svg
:target: https://pypi.org/project/pypdf-table-extraction/

.. image:: https://img.shields.io/pypi/l/camelot-py.svg
:target: https://pypi.org/project/camelot-py/
.. image:: https://img.shields.io/pypi/l/pypdf-table-extraction.svg
:target: https://pypi.org/project/pypdf-table-extraction/

.. image:: https://img.shields.io/pypi/pyversions/camelot-py.svg
:target: https://pypi.org/project/camelot-py/
.. image:: https://img.shields.io/pypi/pyversions/pypdf-table-extraction.svg
:target: (https://pypi.org/project/pypdf-table-extraction/

.. image:: https://badges.gitter.im/camelot-dev/Lobby.png
:target: https://gitter.im/camelot-dev/Lobby

.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
:target: https://github.com/ambv/black
**pypdf-table-extraction** Formerly known as Camelot is a Python library that can help you extract tables from PDFs!

.. image:: https://img.shields.io/badge/continous%20quality-deepsource-lightgrey
:target: https://deepsource.io/gh/camelot-dev/camelot/?ref=repository-badge

**Camelot** is a Python library that can help you extract tables from PDFs!

.. note:: You can also check out `Excalibur`_, the web interface to Camelot!
.. note:: You can also check out `Excalibur`_, the web interface to pypdf-table-extraction (Camelot)!

.. _Excalibur: https://github.com/camelot-dev/excalibur

Expand Down Expand Up @@ -70,9 +62,9 @@ Release v\ |version|. (:ref:`Installation <install>`)
.. csv-table::
:file: _static/csv/foo.csv

Camelot also comes packaged with a :ref:`command-line interface <cli>`!
pypdf-table-extraction also comes packaged with a :ref:`command-line interface <cli>`!

.. note:: Camelot only works with text-based PDFs and not scanned documents. (As Tabula `explains`_, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)
.. note:: pypdf-table-extraction only works with text-based PDFs and not scanned documents. (As Tabula `explains`_, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)

You can check out some frequently asked questions :ref:`here <faq>`.

Expand All @@ -91,12 +83,6 @@ See `comparison with similar libraries and tools`_.

.. _comparison with similar libraries and tools: https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools

Support the development
-----------------------

If Camelot has helped you, please consider supporting its development with a one-time or monthly donation `on OpenCollective`_!

.. _on OpenCollective: https://opencollective.com/camelot

The User Guide
--------------
Expand Down
14 changes: 7 additions & 7 deletions docs/user/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ Specify table areas

In cases such as `these <../_static/pdf/table_areas.pdf>`__, it can be useful to specify exact table boundaries. You can plot the text on this page and note the top left and bottom right coordinates of the table.

Table areas that you want Camelot to analyze can be passed as a list of comma-separated strings to :meth:`read_pdf() <camelot.read_pdf>`, using the ``table_areas`` keyword argument.
Table areas that you want pypdf-table-extraction to analyze can be passed as a list of comma-separated strings to :meth:`read_pdf() <camelot.read_pdf>`, using the ``table_areas`` keyword argument.

::

Expand All @@ -223,7 +223,7 @@ Table areas that you want Camelot to analyze can be passed as a list of comma-se
Specify table regions
---------------------

However there may be cases like `[1] <../_static/pdf/table_regions.pdf>`__ and `[2] <https://github.com/camelot-dev/camelot/blob/master/tests/files/tableception.pdf>`__, where the table might not lie at the exact coordinates every time but in an approximate region.
However there may be cases like `[1] <../_static/pdf/table_regions.pdf>`__ and `[2] <https://github.com/py-pdf/pypdf_table_extraction/blob/main/tests/files/tableception.pdf>`__, where the table might not lie at the exact coordinates every time but in an approximate region.

You can use the ``table_regions`` keyword argument to :meth:`read_pdf() <camelot.read_pdf>` to solve for such cases. When ``table_regions`` is specified, Camelot will only analyze the specified regions to look for tables.

Expand All @@ -244,7 +244,7 @@ You can use the ``table_regions`` keyword argument to :meth:`read_pdf() <camelot
Specify column separators
-------------------------

In cases like `these <../_static/pdf/column_separators.pdf>`__, where the text is very close to each other, it is possible that Camelot may guess the column separators' coordinates incorrectly. To correct this, you can explicitly specify the *x* coordinate for each column separator by plotting the text on the page.
In cases like `these <../_static/pdf/column_separators.pdf>`__, where the text is very close to each other, it is possible that pypdf-table-extraction may guess the column separators' coordinates incorrectly. To correct this, you can explicitly specify the *x* coordinate for each column separator by plotting the text on the page.

You can pass the column separators as a list of comma-separated strings to :meth:`read_pdf() <camelot.read_pdf>`, using the ``columns`` keyword argument.

Expand Down Expand Up @@ -334,7 +334,7 @@ You can solve this by passing ``flag_size=True``, which will enclose the supersc
Strip characters from text
--------------------------

You can strip unwanted characters like spaces, dots and newlines from a string using the ``strip_text`` keyword argument. Take a look at `this PDF <https://github.com/camelot-dev/camelot/blob/master/tests/files/tabula/12s0324.pdf>`_ as an example, the text at the start of each row contains a lot of unwanted spaces, dots and newlines.
You can strip unwanted characters like spaces, dots and newlines from a string using the ``strip_text`` keyword argument. Take a look at `this PDF <https://github.com/py-pdf/pypdf_table_extraction/blob/master/tests/files/tabula/12s0324.pdf>`_ as an example, the text at the start of each row contains a lot of unwanted spaces, dots and newlines.

::

Expand All @@ -360,7 +360,7 @@ You can strip unwanted characters like spaces, dots and newlines from a string u
Improve guessed table areas
---------------------------

While using :ref:`Stream <stream>`, automatic table detection can fail for PDFs like `this one <https://github.com/camelot-dev/camelot/blob/master/tests/files/edge_tol.pdf>`_. That's because the text is relatively far apart vertically, which can lead to shorter textedges being calculated.
While using :ref:`Stream <stream>`, automatic table detection can fail for PDFs like `this one <https://github.com/py-pdf/pypdf_table_extraction/blob/master/tests/files/edge_tol.pdf>`_. That's because the text is relatively far apart vertically, which can lead to shorter textedges being calculated.

.. note:: To know more about how textedges are calculated to guess table areas, you can see pages 20, 35 and 40 of `Anssi Nurminen's master's thesis <https://trepo.tuni.fi/bitstream/handle/123456789/21520/Nurminen.pdf?sequence=3>`_.

Expand Down Expand Up @@ -487,7 +487,7 @@ Clearly, the smaller lines separating the headers, couldn't be detected. Let's t
:alt: An improved plot of the PDF table with short lines
:align: left

Voila! Camelot can now see those lines. Let's get our table.
Voila! pypdf-table-extraction can now see those lines. Let's get our table.

::

Expand Down Expand Up @@ -616,7 +616,7 @@ We don't need anything else. Now, let's pass ``copy_text=['v']`` to copy text in
Tweak layout generation
-----------------------

Camelot is built on top of PDFMiner's functionality of grouping characters on a page into words and sentences. In some cases (such as `#170 <https://github.com/camelot-dev/camelot/issues/170>`_ and `#215 <https://github.com/camelot-dev/camelot/issues/215>`_), PDFMiner can group characters that should belong to the same sentence into separate sentences.
pypdf-table-extraction is built on top of PDFMiner's functionality of grouping characters on a page into words and sentences. In some cases (such as `#170 <https://github.com/atlanhq/camelot/issues/170>`_ and `#215 <https://github.com/atlanhq/camelot/issues/215>`_), PDFMiner can group characters that should belong to the same sentence into separate sentences.

To deal with such cases, you can tweak PDFMiner's `LAParams kwargs <https://github.com/euske/pdfminer/blob/master/pdfminer/layout.py#L33>`_ to improve layout generation, by passing the keyword arguments as a dict using ``layout_kwargs`` in :meth:`read_pdf() <camelot.read_pdf>`. To know more about the parameters you can tweak, you can check out `PDFMiner docs <https://pdfminersix.rtfd.io/en/latest/reference/composable.html>`_.

Expand Down
8 changes: 4 additions & 4 deletions docs/user/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@
Command-Line Interface
======================

Camelot comes with a command-line interface.
pypdf-table-extraction comes with a command-line interface.

You can print the help for the interface by typing ``camelot --help`` in your favorite terminal program, as shown below. Furthermore, you can print the help for each command by typing ``camelot <command> --help``. Try it out!
You can print the help for the interface by typing ``camelot --help`` in your favorite terminal program, as shown below. Furthermore, you can print the help for each command by typing ``pypdf-table-extraction <command> --help``. Try it out!

::

Usage: camelot [OPTIONS] COMMAND [ARGS]...
Usage: pypdf-table-extraction [OPTIONS] COMMAND [ARGS]...

Camelot: PDF Table Extraction for Humans
pypdf-table-extraction: PDF Table Extraction for Humans

Options:
--version Show the version and exit.
Expand Down
6 changes: 3 additions & 3 deletions docs/user/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
Frequently Asked Questions
==========================

This part of the documentation answers some common questions. To add questions, please open an issue `here <https://github.com/camelot-dev/camelot/issues/new>`_.
This part of the documentation answers some common questions. To add questions, please open an issue `here <https://github.com/py-pdf/pypdf_table_extraction/issues/new>`_.

Does Camelot work with image-based PDFs?
Does pypdf-table-extraction work with image-based PDFs?
----------------------------------------

**No**, Camelot only works with text-based PDFs and not scanned documents. (As Tabula `explains <https://github.com/tabulapdf/tabula#why-tabula>`_, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)
**No**, pypdf-table-extraction only works with text-based PDFs and not scanned documents. (As Tabula `explains <https://github.com/tabulapdf/tabula#why-tabula>`_, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)

How to reduce memory usage for long PDFs?
-----------------------------------------
Expand Down
Loading

0 comments on commit 5205e91

Please sign in to comment.