Skip to content

Commit

Permalink
fix: pytesseract>=0.3.12 installation error while installing pdf
Browse files Browse the repository at this point in the history
…extra (#3522)

Closes #3521.

This PR resolves an installation error with `pytesseract>=0.3.12` that
occurred during `pip install unstructured[pdf]==0.15.3`.

### Testing
**Run following command in main branch and this PR**
```
pip uninstall -y pytesseract && pip install ".[pdf]"
```
**Results**
- `main` branch
```
INFO: pip is looking at multiple versions of unstructured[pdf] to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement pytesseract>=0.3.12; extra == "pdf" (from unstructured[pdf]) (from versions: 0.1, 0.1.3, 0.1.4, 0.1.5, 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.2.0, 0.2.2, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.3.6, 0.3.7, 0.3.8, 0.3.9, 0.3.10)
ERROR: No matching distribution found for pytesseract>=0.3.12; extra == "pdf"
```
- this `PR`

`pytesseract-0.3.13` should be installed successfully.
  • Loading branch information
christinestraub authored Aug 14, 2024
1 parent d6a84bd commit 9b778e2
Show file tree
Hide file tree
Showing 5 changed files with 16 additions and 7 deletions.
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
## 0.15.4

### Enhancements

### Features

### Fixes

* **Resolve an installation error with `pytesseract>=0.3.12` that occurred during `pip install unstructured[pdf]==0.15.3`.**

## 0.15.3

### Enhancements
Expand Down
3 changes: 1 addition & 2 deletions requirements/deps/constraints.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,7 @@ Office365-REST-Python-Client<2.4.3
# unstructured-inference to be upgraded when unstructured library is upgraded
# https://github.com/Unstructured-IO/unstructured/issues/1458
# unstructured-inference
# use the known compatible version of weaviate and pytesseract
pytesseract @ git+https://github.com/madmaze/[email protected]
# use the known compatible version of weaviate
weaviate-client>3.25.0
# TODO: Pinned in transformers package, remove when that gets updated
tokenizers>=0.19,<0.20
Expand Down
4 changes: 3 additions & 1 deletion requirements/extra-pdf-image.in
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,6 @@ effdet
# Do not move to constraints.in, otherwise unstructured-inference will not be upgraded
# when unstructured library is.
unstructured-inference==0.7.36
pytesseract>=0.3.12
# NOTE(christine): Pinned to a specific version of pytesseract from the GitHub repository.
# Remove this pin and switch to the latest version from PyPI once version 0.3.13 or newer is officially released.
pytesseract @ git+https://github.com/madmaze/[email protected]
4 changes: 1 addition & 3 deletions requirements/extra-pdf-image.txt
Original file line number Diff line number Diff line change
Expand Up @@ -202,9 +202,7 @@ pypdf==4.3.1
pypdfium2==4.30.0
# via pdfplumber
pytesseract @ git+https://github.com/madmaze/[email protected]
# via
# -c ././deps/constraints.txt
# -r ./extra-pdf-image.in
# via -r ./extra-pdf-image.in
python-dateutil==2.9.0.post0
# via
# -c ./base.txt
Expand Down
2 changes: 1 addition & 1 deletion unstructured/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.15.3" # pragma: no cover
__version__ = "0.15.4" # pragma: no cover

0 comments on commit 9b778e2

Please sign in to comment.