Skip to content

Commit

Permalink
build(deps): replace pillow-heif with pi-heif (#3571)
Browse files Browse the repository at this point in the history
### Summary

Closes #2664 and replaces `pillow-heif` with `pi-heif` due to more
permissive licensing on the binary wheel for `pi-heif`.
  • Loading branch information
MthwRobinson authored Aug 27, 2024
1 parent ddba928 commit 4194a07
Show file tree
Hide file tree
Showing 26 changed files with 35 additions and 93 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## 0.15.8-dev5
## 0.15.8

### Enhancements

Expand All @@ -10,6 +10,7 @@

### Fixes

* **Replace `pillow-heif` with `pi-heif`**. Replaces `pillow-heif` with `pi-heif` due to more permissive licensing on the wheel for `pi-heif`.
* **Minify text_as_html from DOCX.** Previously `.metadata.text_as_html` for DOCX tables was "bloated" with whitespace and noise elements introduced by `tabulate` that produced over-chunking and lower "semantic density" of elements. Reduce HTML to minimum character count without preserving all text.
* **Fall back to filename extension-based file-type detection for unidentified OLE files.** Resolves a problem where a DOC file that could not be detected as such by `filetype` was incorrectly identified as a MSG file.

Expand Down
2 changes: 1 addition & 1 deletion requirements/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ h11==0.14.0
# via httpcore
httpcore==1.0.5
# via httpx
httpx==0.27.0
httpx==0.27.2
# via unstructured-client
idna==3.8
# via
Expand Down
2 changes: 1 addition & 1 deletion requirements/dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,7 @@ wheel==0.44.0
# pip-tools
widgetsnbextension==4.0.13
# via ipywidgets
zipp==3.20.0
zipp==3.20.1
# via importlib-metadata

# The following packages are considered to be unsafe in a requirements file:
Expand Down
2 changes: 1 addition & 1 deletion requirements/extra-markdown.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,5 @@ importlib-metadata==8.4.0
# via markdown
markdown==3.7
# via -r ./extra-markdown.in
zipp==3.20.0
zipp==3.20.1
# via importlib-metadata
4 changes: 2 additions & 2 deletions requirements/extra-paddleocr.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ httpcore==1.0.5
# via
# -c ./base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./base.txt
# paddlepaddle
Expand Down Expand Up @@ -176,5 +176,5 @@ urllib3==1.26.19
# -c ././deps/constraints.txt
# -c ./base.txt
# requests
zipp==3.20.0
zipp==3.20.1
# via importlib-resources
2 changes: 1 addition & 1 deletion requirements/extra-pdf-image.in
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ onnx
pdf2image
pdfminer.six
pikepdf
pillow_heif
pi_heif
pypdf
google-cloud-vision
effdet
Expand Down
10 changes: 5 additions & 5 deletions requirements/extra-pdf-image.txt
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ google-auth==2.34.0
# google-cloud-vision
google-cloud-vision==3.7.4
# via -r ./extra-pdf-image.in
googleapis-common-protos==1.63.2
googleapis-common-protos==1.64.0
# via
# google-api-core
# grpcio-status
Expand Down Expand Up @@ -147,6 +147,8 @@ pdfminer-six==20231228
# pdfplumber
pdfplumber==0.11.4
# via layoutparser
pi-heif==0.18.0
# via -r ./extra-pdf-image.in
pikepdf==9.2.0
# via -r ./extra-pdf-image.in
pillow==10.4.0
Expand All @@ -155,12 +157,10 @@ pillow==10.4.0
# matplotlib
# pdf2image
# pdfplumber
# pi-heif
# pikepdf
# pillow-heif
# torchvision
# unstructured-pytesseract
pillow-heif==0.18.0
# via -r ./extra-pdf-image.in
portalocker==2.10.1
# via iopath
proto-plus==1.24.0
Expand Down Expand Up @@ -293,5 +293,5 @@ wrapt==1.16.0
# -c ././deps/constraints.txt
# -c ./base.txt
# deprecated
zipp==3.20.0
zipp==3.20.1
# via importlib-resources
2 changes: 1 addition & 1 deletion requirements/ingest/astradb.txt
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx[http2]==0.27.0
httpx[http2]==0.27.2
# via
# -c ./ingest/../base.txt
# astrapy
Expand Down
4 changes: 2 additions & 2 deletions requirements/ingest/chroma.txt
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ fsspec==2024.6.1
# via huggingface-hub
google-auth==2.34.0
# via kubernetes
googleapis-common-protos==1.63.2
googleapis-common-protos==1.64.0
# via opentelemetry-exporter-otlp-proto-grpc
grpcio==1.66.0
# via
Expand Down Expand Up @@ -245,7 +245,7 @@ wrapt==1.16.0
# -c ./ingest/../deps/constraints.txt
# deprecated
# opentelemetry-instrumentation
zipp==3.20.0
zipp==3.20.1
# via
# importlib-metadata
# importlib-resources
Expand Down
4 changes: 2 additions & 2 deletions requirements/ingest/clarifai.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ clarifai-grpc==10.7.1
# via clarifai
contextlib2==21.6.0
# via schema
googleapis-common-protos==1.63.2
googleapis-common-protos==1.64.0
# via clarifai-grpc
grpcio==1.66.0
# via
Expand Down Expand Up @@ -61,7 +61,7 @@ requests==2.32.3
# via
# -c ./ingest/../base.txt
# clarifai-grpc
rich==13.7.1
rich==13.8.0
# via clarifai
schema==0.7.5
# via clarifai
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/databricks-volumes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ charset-normalizer==3.3.2
# via
# -c ./ingest/../base.txt
# requests
databricks-sdk==0.30.0
databricks-sdk==0.31.0
# via -r ./ingest/databricks-volumes.in
google-auth==2.34.0
# via databricks-sdk
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/embed-aws-bedrock.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./ingest/../base.txt
# langsmith
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/embed-huggingface.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./ingest/../base.txt
# langsmith
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/embed-octoai.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./ingest/../base.txt
# openai
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/embed-openai.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./ingest/../base.txt
# langsmith
Expand Down
4 changes: 2 additions & 2 deletions requirements/ingest/embed-vertexai.txt
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ google-resumable-media==2.7.2
# via
# google-cloud-bigquery
# google-cloud-storage
googleapis-common-protos[grpc]==1.63.2
googleapis-common-protos[grpc]==1.64.0
# via
# google-api-core
# grpc-google-iam-v1
Expand All @@ -112,7 +112,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./ingest/../base.txt
# langchain-google-vertexai
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/embed-voyageai.txt
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./ingest/../base.txt
# langsmith
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/gcs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ google-crc32c==1.5.0
# google-resumable-media
google-resumable-media==2.7.2
# via google-cloud-storage
googleapis-common-protos==1.63.2
googleapis-common-protos==1.64.0
# via google-api-core
idna==3.8
# via
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/google-drive.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ google-auth==2.34.0
# google-auth-httplib2
google-auth-httplib2==0.2.0
# via google-api-python-client
googleapis-common-protos==1.63.2
googleapis-common-protos==1.64.0
# via google-api-core
httplib2==0.22.0
# via
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/notion.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./ingest/../base.txt
# notion-client
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/qdrant.txt
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx[http2]==0.27.0
httpx[http2]==0.27.2
# via
# -c ./ingest/../base.txt
# qdrant-client
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/singlestore.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ wheel==0.44.0
# via
# -c ./ingest/../deps/constraints.txt
# singlestoredb
zipp==3.20.0
zipp==3.20.1
# via importlib-metadata

# The following packages are considered to be unsafe in a requirements file:
Expand Down
61 changes: 1 addition & 60 deletions requirements/ingest/weaviate.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,12 @@
#
# pip-compile ./ingest/weaviate.in
#
annotated-types==0.7.0
# via pydantic
anyio==4.4.0
# via
# -c ./ingest/../base.txt
# httpx
authlib==1.3.2
# via weaviate-client
certifi==2024.7.4
# via
# -c ./ingest/../base.txt
# -c ./ingest/../deps/constraints.txt
# httpcore
# httpx
# requests
cffi==1.17.0
# via cryptography
Expand All @@ -27,75 +19,24 @@ charset-normalizer==3.3.2
# requests
cryptography==43.0.0
# via authlib
exceptiongroup==1.2.2
# via
# -c ./ingest/../base.txt
# anyio
grpcio==1.66.0
# via
# -c ./ingest/../deps/constraints.txt
# grpcio-health-checking
# grpcio-tools
# weaviate-client
grpcio-health-checking==1.62.3
# via weaviate-client
grpcio-tools==1.62.3
# via weaviate-client
h11==0.14.0
# via
# -c ./ingest/../base.txt
# httpcore
httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
# via
# -c ./ingest/../base.txt
# weaviate-client
idna==3.8
# via
# -c ./ingest/../base.txt
# anyio
# httpx
# requests
protobuf==4.23.4
# via
# -c ./ingest/../deps/constraints.txt
# grpcio-health-checking
# grpcio-tools
pycparser==2.22
# via cffi
pydantic==2.8.2
# via weaviate-client
pydantic-core==2.20.1
# via pydantic
requests==2.32.3
# via
# -c ./ingest/../base.txt
# weaviate-client
sniffio==1.3.1
# via
# -c ./ingest/../base.txt
# anyio
# httpx
typing-extensions==4.12.2
# via
# -c ./ingest/../base.txt
# anyio
# pydantic
# pydantic-core
urllib3==1.26.19
# via
# -c ./ingest/../base.txt
# -c ./ingest/../deps/constraints.txt
# requests
validators==0.33.0
# via weaviate-client
weaviate-client==4.7.1
weaviate-client==3.26.7
# via
# -c ./ingest/../deps/constraints.txt
# -r ./ingest/weaviate.in

# The following packages are considered to be unsafe in a requirements file:
# setuptools
2 changes: 1 addition & 1 deletion requirements/test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ httpcore==1.0.5
# via
# -c ./base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./base.txt
# label-studio-sdk
Expand Down
2 changes: 1 addition & 1 deletion unstructured/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.15.8-dev5" # pragma: no cover
__version__ = "0.15.8" # pragma: no cover
2 changes: 1 addition & 1 deletion unstructured/partition/pdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
from pdfminer.layout import LTChar, LTContainer, LTImage, LTItem, LTTextBox
from pdfminer.pdftypes import PDFObjRef
from pdfminer.utils import open_filename
from pi_heif import register_heif_opener
from PIL import Image as PILImage
from pillow_heif import register_heif_opener
from pypdf import PdfReader

from unstructured.chunking import add_chunking_strategy
Expand Down

0 comments on commit 4194a07

Please sign in to comment.