Skip to content

Commit

Permalink
Roman/bugfix support bedrock embeddings (Unstructured-IO#2650)
Browse files Browse the repository at this point in the history
### Description
This PR resolved the following open issue:
[bug/bedrock-encoder-not-supported-in-ingest](https://github.com/Unstructured-IO/unstructured/issues/2319).
To do so, the following changes were made:
* All aws configs were added as input parameters to the CLI
* These were mapped to the bedrock embedder when an embedder is
generated via `get_embedder`
* An ingest test was added to call the aws bedrock service
* Requirements for boto were bumped because the first version to
introduce the bedrock runtime, which is required to hit the bedrock
service, was introduced in version `1.34.63`, which was ahead of the
version of boto pinned.

---------

Co-authored-by: ryannikolaidis <[email protected]>
Co-authored-by: rbiseck3 <[email protected]>
  • Loading branch information
3 people authored Mar 21, 2024
1 parent 9177aa2 commit 4ff6a5b
Show file tree
Hide file tree
Showing 11 changed files with 20,363 additions and 55 deletions.
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ repos:
- id: mixed-line-ending

- repo: https://github.com/psf/black
rev: 22.10.0
rev: 24.2.0
hooks:
- id: black
args: ["--line-length=100"]
Expand All @@ -28,7 +28,7 @@ repos:
["--fix"]

- repo: https://github.com/pycqa/flake8
rev: 4.0.1
rev: 7.0.0
hooks:
- id: flake8
language_version: python3
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
* **Clarify IAM Role Requirement for GCS Platform Connectors**. The GCS Source Connector requires Storage Object Viewer and GCS Destination Connector requires Storage Object Creator IAM roles.
* **Fix OneDrive dates with inconsistent formatting** Adds logic to conditionally support dates returned by office365 that may vary in date formatting or may be a datetime rather than a string. See previous fix for SharePoint
* **Adds tracking for AstraDB** Adds tracking info so AstraDB can see what source called their api.
* **Support AWS Bedrock Embeddings in ingest CLI** The configs required to instantiate the bedrock embedding class are now exposed in the api and the version of boto being used meets the minimum requirement to introduce the bedrock runtime required to hit the service.
>>>>>>> 6a63c941c (bump changelog)
## 0.12.6

Expand Down
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -383,7 +383,7 @@ check-shfmt:

.PHONY: check-black
check-black:
black . --check
black . --check --line-length=100

.PHONY: check-flake8
check-flake8:
Expand Down Expand Up @@ -429,7 +429,7 @@ tidy-shell:
tidy-python:
ruff . --fix-only || true
autoflake --in-place .
black .
black --line-length=100 .

## version-sync: update __version__.py with most recent version from CHANGELOG.md
.PHONY: version-sync
Expand Down
4 changes: 0 additions & 4 deletions requirements/constraints.in
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,6 @@
# extras. Putting a dependency here will only affect dependency sets that contain them -- in other
# words, if something does not require a constraint, it will not be installed.
####################################################################################################
# NOTE(alan): Pinning to avoid conflicts with downstream ingest-s3
urllib3<1.27, >=1.25.4
boto3<1.28.18
botocore<1.31.18
# consistency with local-inference-pin
protobuf<4.24
# NOTE(robinson) - Required pins for security scans
Expand Down
62 changes: 30 additions & 32 deletions requirements/ingest/embed-aws-bedrock.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,41 +2,38 @@
# This file is autogenerated by pip-compile with Python 3.9
# by the following command:
#
# pip-compile --output-file=ingest/embed-aws-bedrock.txt ingest/embed-aws-bedrock.in
# pip-compile embed-aws-bedrock.in
#
aiohttp==3.9.3
# via langchain-community
aiosignal==1.3.1
# via aiohttp
anyio==3.7.1
# via
# -c ingest/../constraints.in
# -c ../constraints.in
# langchain-core
async-timeout==4.0.3
# via aiohttp
attrs==23.2.0
# via aiohttp
boto3==1.28.17
boto3==1.34.63
# via -r embed-aws-bedrock.in
botocore==1.34.63
# via
# -c ingest/../constraints.in
# -r ingest/embed-aws-bedrock.in
botocore==1.31.17
# via
# -c ingest/../constraints.in
# boto3
# s3transfer
certifi==2024.2.2
# via
# -c ingest/../base.txt
# -c ingest/../constraints.in
# -c ../base.txt
# -c ../constraints.in
# requests
charset-normalizer==3.3.2
# via
# -c ingest/../base.txt
# -c ../base.txt
# requests
dataclasses-json==0.6.4
# via
# -c ingest/../base.txt
# -c ../base.txt
# langchain-community
exceptiongroup==1.2.0
# via anyio
Expand All @@ -46,7 +43,7 @@ frozenlist==1.4.1
# aiosignal
idna==3.6
# via
# -c ingest/../base.txt
# -c ../base.txt
# anyio
# requests
# yarl
Expand All @@ -58,82 +55,83 @@ jsonpatch==1.33
# via langchain-core
jsonpointer==2.4
# via jsonpatch
langchain-community==0.0.20
# via -r ingest/embed-aws-bedrock.in
langchain-core==0.1.23
langchain-community==0.0.28
# via -r embed-aws-bedrock.in
langchain-core==0.1.32
# via langchain-community
langsmith==0.0.87
langsmith==0.1.26
# via
# langchain-community
# langchain-core
marshmallow==3.20.2
# via
# -c ingest/../base.txt
# -c ../base.txt
# dataclasses-json
multidict==6.0.5
# via
# aiohttp
# yarl
mypy-extensions==1.0.0
# via
# -c ingest/../base.txt
# -c ../base.txt
# typing-inspect
numpy==1.26.4
# via
# -c ingest/../base.txt
# -c ../base.txt
# langchain-community
orjson==3.9.15
# via langsmith
packaging==23.2
# via
# -c ingest/../base.txt
# -c ../base.txt
# langchain-core
# marshmallow
pydantic==1.10.14
# via
# -c ingest/../constraints.in
# -c ../constraints.in
# langchain-core
# langsmith
python-dateutil==2.8.2
# via
# -c ingest/../base.txt
# -c ../base.txt
# botocore
pyyaml==6.0.1
# via
# langchain-community
# langchain-core
requests==2.31.0
# via
# -c ingest/../base.txt
# -c ../base.txt
# langchain-community
# langchain-core
# langsmith
s3transfer==0.6.2
s3transfer==0.10.1
# via boto3
six==1.16.0
# via
# -c ingest/../base.txt
# -c ../base.txt
# python-dateutil
sniffio==1.3.0
sniffio==1.3.1
# via anyio
sqlalchemy==2.0.27
sqlalchemy==2.0.28
# via langchain-community
tenacity==8.2.3
# via
# langchain-community
# langchain-core
typing-extensions==4.9.0
# via
# -c ingest/../base.txt
# -c ../base.txt
# pydantic
# sqlalchemy
# typing-inspect
typing-inspect==0.9.0
# via
# -c ingest/../base.txt
# -c ../base.txt
# dataclasses-json
urllib3==1.26.18
# via
# -c ingest/../base.txt
# -c ingest/../constraints.in
# -c ../base.txt
# botocore
# requests
yarl==1.9.4
Expand Down
27 changes: 12 additions & 15 deletions requirements/ingest/s3.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
# This file is autogenerated by pip-compile with Python 3.9
# by the following command:
#
# pip-compile --output-file=ingest/s3.txt ingest/s3.in
# pip-compile s3.in
#
aiobotocore==2.7.0
aiobotocore==2.12.1
# via s3fs
aiohttp==3.9.3
# via
Expand All @@ -18,21 +18,19 @@ async-timeout==4.0.3
# via aiohttp
attrs==23.2.0
# via aiohttp
botocore==1.31.17
# via
# -c ingest/../constraints.in
# aiobotocore
botocore==1.34.51
# via aiobotocore
frozenlist==1.4.1
# via
# aiohttp
# aiosignal
fsspec==2024.2.0
# via
# -r ingest/s3.in
# -r s3.in
# s3fs
idna==3.6
# via
# -c ingest/../base.txt
# -c ../base.txt
# yarl
jmespath==1.0.1
# via botocore
Expand All @@ -42,26 +40,25 @@ multidict==6.0.5
# yarl
python-dateutil==2.8.2
# via
# -c ingest/../base.txt
# -c ../base.txt
# botocore
s3fs==2024.2.0
# via -r ingest/s3.in
# via -r s3.in
six==1.16.0
# via
# -c ingest/../base.txt
# -c ../base.txt
# python-dateutil
typing-extensions==4.9.0
# via
# -c ingest/../base.txt
# -c ../base.txt
# aioitertools
urllib3==1.26.18
# via
# -c ingest/../base.txt
# -c ingest/../constraints.in
# -c ../base.txt
# botocore
wrapt==1.16.0
# via
# -c ingest/../base.txt
# -c ../base.txt
# aiobotocore
yarl==1.9.4
# via aiohttp
Loading

0 comments on commit 4ff6a5b

Please sign in to comment.