-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
10 changed files
with
628 additions
and
37 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,20 +10,14 @@ jobs: | |
Linting: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v2 | ||
- uses: actions/setup-python@v2 | ||
- name: Set PY variable | ||
run: echo "PY=$(python -VV | sha256sum | cut -d' ' -f1)" >> $GITHUB_ENV | ||
- uses: actions/cache@v2 | ||
- uses: actions/checkout@v3 | ||
with: | ||
path: ~/.cache/pre-commit | ||
key: pre-commit|${{ env.PY }}|${{ hashFiles('.pre-commit-config.yaml') }} | ||
- name: Install pre-commit | ||
run: | | ||
pip install pre-commit | ||
pre-commit install | ||
- name: Run pre-commit | ||
run: SKIP=no-commit-to-branch pre-commit run --all-files | ||
# requites to grab the history of the PR | ||
fetch-depth: 0 | ||
- uses: actions/setup-python@v3 | ||
- uses: pre-commit/[email protected] | ||
with: | ||
extra_args: --color=always --from-ref ${{ github.event.pull_request.base.sha }} --to-ref ${{ github.event.pull_request.head.sha }} | ||
|
||
Pytest: | ||
runs-on: ubuntu-latest | ||
|
@@ -45,6 +39,7 @@ jobs: | |
- name: Install dependencies | ||
run: | | ||
pip install -e '.[dev]' | ||
pip install poetry build | ||
- name: Test with Pytest on Python ${{ matrix.python-version }} | ||
run: python -m pytest --cov edspdf --cov-report xml | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,6 +11,8 @@ Processing PDFs usually involves many steps such as extracting lines, running OC | |
can use any technology in static components, we do not provide tools to train | ||
components built with other deep learning frameworks. | ||
|
||
## Creating a pipeline | ||
|
||
A pipe is a processing block (like a function) that applies a transformation on its input and returns a modified object. | ||
|
||
At the moment, four types of pipes are implemented in the library: | ||
|
@@ -57,7 +59,33 @@ model(pdf_bytes) | |
model.pipe([pdf_bytes, ...]) | ||
``` | ||
|
||
## Hybrid models | ||
### Hybrid models | ||
|
||
EDS-PDF was designed to facilitate the training and inference of hybrid models that | ||
arbitrarily chain static components or trained deep learning components. Static components are callable objects that take a PDFDoc object as input, perform arbitrary transformations over the input, and return the modified object. [Trainable pipes][edspdf.trainable_pipe.TrainablePipe], on the other hand, allow for deep learning operations to be performed on the [PDFDoc][edspdf.structures.PDFDoc] object and must be trained to be used. | ||
|
||
## Saving and loading a pipeline | ||
|
||
Pipelines can be saved and loaded using the `save` and `load` methods. The saved pipeline is not a pickled objet but a folder containing the config file, the weights and extra resources for each pipeline. This allows for easy inspection and modification of the pipeline, and avoids the execution of arbitrary code when loading a pipeline. | ||
|
||
```python | ||
model.save("path/to/your/model") | ||
model = edspdf.load("path/to/your/model") | ||
``` | ||
|
||
To share the pipeline and turn it into a pip installable package, you can use the `package` method, which will use or create a pyproject.toml file, fill it accordingly, and create a wheel file. At the moment, we only support the poetry package manager. | ||
|
||
```python | ||
model.package( | ||
name="path/to/your/package", | ||
version="0.0.1", | ||
root_dir="path/to/project/root", # optional, to retrieve an existing pyproject.toml file | ||
# if you don't have a pyproject.toml, you can provide the metadata here instead | ||
metadata=dict( | ||
authors="Firstname Lastname <[email protected]>", | ||
description="A short description of your package", | ||
), | ||
) | ||
``` | ||
|
||
This will create a wheel file in the root_dir/dist folder, which you can share and install with pip |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.