Skip to content

Commit

Permalink
Merge branch 'main' into pg_migration_methods
Browse files Browse the repository at this point in the history
  • Loading branch information
twishabansal committed Sep 18, 2024
2 parents 57d7b5f + 400edf4 commit 8d68966
Show file tree
Hide file tree
Showing 10 changed files with 438 additions and 22 deletions.
36 changes: 31 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,36 @@
# Changelog

## [0.7.0](https://github.com/googleapis/langchain-google-alloydb-pg-python/compare/v0.6.0...v0.7.0) (2024-09-17)


### ⚠ BREAKING CHANGES

* support async and sync versions of indexing methods
* remove _aexecute(), _execute(), _afetch(), and _fetch() methods

### Features

* Add from_engine_args method ([f06834d](https://github.com/googleapis/langchain-google-alloydb-pg-python/commit/f06834d67b3219d0981dbad191f0c6d89dc8aa03))
* Add multi-modal support vector store ([#207](https://github.com/googleapis/langchain-google-alloydb-pg-python/issues/207)) ([8d259ba](https://github.com/googleapis/langchain-google-alloydb-pg-python/commit/8d259ba0c98476b312e7a7201fa86f9316919ab6))
* Add support for custom schema names ([#210](https://github.com/googleapis/langchain-google-alloydb-pg-python/issues/210)) ([f148a7e](https://github.com/googleapis/langchain-google-alloydb-pg-python/commit/f148a7e7ebd58afc18ee32f42bfe980a5fde9907))
* Add support for sync from_engine ([f06834d](https://github.com/googleapis/langchain-google-alloydb-pg-python/commit/f06834d67b3219d0981dbad191f0c6d89dc8aa03))
* Allow non-uuid data types for vectorstore primary key ([#226](https://github.com/googleapis/langchain-google-alloydb-pg-python/issues/226)) ([e6dc991](https://github.com/googleapis/langchain-google-alloydb-pg-python/commit/e6dc9918afa42086cc975056bb23862d4bb7017e))
* Made add_embeddings method public ([#228](https://github.com/googleapis/langchain-google-alloydb-pg-python/issues/228)) ([433b92f](https://github.com/googleapis/langchain-google-alloydb-pg-python/commit/433b92f4a2c62ccc2244c95feba9e1e0ec9620ea))
* Refactor to support both async and sync usage ([f06834d](https://github.com/googleapis/langchain-google-alloydb-pg-python/commit/f06834d67b3219d0981dbad191f0c6d89dc8aa03))


### Bug Fixes

* Fix documentation comments for library methods. ([#219](https://github.com/googleapis/langchain-google-alloydb-pg-python/issues/219)) ([5b923b2](https://github.com/googleapis/langchain-google-alloydb-pg-python/commit/5b923b299fba54e3be36363f070fec7264444aea))
* Replacing cosine_similarity and maximal_marginal_relevance methods ([#218](https://github.com/googleapis/langchain-google-alloydb-pg-python/issues/218)) ([d827ccc](https://github.com/googleapis/langchain-google-alloydb-pg-python/commit/d827ccce0ba3b032e636b26e9ea5c5a5d3e151f6))
* Support async and sync versions of indexing methods ([f06834d](https://github.com/googleapis/langchain-google-alloydb-pg-python/commit/f06834d67b3219d0981dbad191f0c6d89dc8aa03))


### Documentation

* Added example for chat message history with omni ([#191](https://github.com/googleapis/langchain-google-alloydb-pg-python/issues/191)) ([2e6809f](https://github.com/googleapis/langchain-google-alloydb-pg-python/commit/2e6809f0ae0063307d49529e784e65a4929e9c85))
* Add Migration samples for migrating from vector store to AlloyDB ([#230](https://github.com/googleapis/langchain-google-alloydb-pg-python/issues/230)) ([9fd9308](https://github.com/googleapis/langchain-google-alloydb-pg-python/commit/9fd93081aa8307d9190f7f30ef6746e0e9aeedbe))

## [0.6.0](https://github.com/googleapis/langchain-google-alloydb-pg-python/compare/v0.5.0...v0.6.0) (2024-09-03)


Expand Down Expand Up @@ -27,11 +58,6 @@
* Rename index guide ([#201](https://github.com/googleapis/langchain-google-alloydb-pg-python/issues/201)) ([2436c5b](https://github.com/googleapis/langchain-google-alloydb-pg-python/commit/2436c5b2120d69938644e20181559aa62b95fc84))
* Update index tuning samples to include advanced indexes ([#197](https://github.com/googleapis/langchain-google-alloydb-pg-python/issues/197)) ([da52bc4](https://github.com/googleapis/langchain-google-alloydb-pg-python/commit/da52bc421da1ca8cad78184566426b22941b09af))


### Miscellaneous Chores

* Release 0.6.0 ([#217](https://github.com/googleapis/langchain-google-alloydb-pg-python/issues/217)) ([3d96251](https://github.com/googleapis/langchain-google-alloydb-pg-python/commit/3d9625131e3412e6d2958856224ef251baa625bd))

## [0.5.0](https://github.com/googleapis/langchain-google-alloydb-pg-python/compare/v0.4.1...v0.5.0) (2024-07-23)


Expand Down
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@ test = [
"mypy==1.11.2",
"pytest-asyncio==0.24.0",
"pytest==8.3.3",
"pytest-cov==5.0.0"
"pytest-cov==5.0.0",
"Pillow==10.4.0"
]

[build-system]
Expand Down
110 changes: 99 additions & 11 deletions src/langchain_google_alloydb_pg/async_vectorstore.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
# TODO: Remove below import when minimum supported Python version is 3.10
from __future__ import annotations

import base64
import json
import uuid
from typing import Any, Callable, Iterable, List, Optional, Sequence, Tuple, Type, Union
Expand Down Expand Up @@ -306,6 +307,42 @@ async def aadd_documents(
ids = await self.aadd_texts(texts, metadatas=metadatas, ids=ids, **kwargs)
return ids

def _encode_image(self, uri: str) -> str:
"""Get base64 string from image URI."""
with open(uri, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")

async def aadd_images(
self,
uris: List[str],
metadatas: Optional[List[dict]] = None,
ids: Optional[List[str]] = None,
**kwargs: Any,
) -> List[str]:
"""Embed images and add to the table.
Args:
uris (List[str]): List of local image URIs to add to the table.
metadatas (Optional[List[dict]]): List of metadatas to add to table records.
ids: (Optional[List[str]]): List of IDs to add to table records.
Returns:
List of record IDs added.
"""
encoded_images = []
if metadatas is None:
metadatas = [{"image_uri": uri} for uri in uris]

for uri in uris:
encoded_image = self._encode_image(uri)
encoded_images.append(encoded_image)

embeddings = self._images_embedding_helper(uris)
ids = await self.aadd_embeddings(
encoded_images, embeddings, metadatas=metadatas, ids=ids, **kwargs
)
return ids

async def adelete(
self,
ids: Optional[List] = None,
Expand Down Expand Up @@ -510,6 +547,35 @@ async def asimilarity_search(
embedding=embedding, k=k, filter=filter, **kwargs
)

def _images_embedding_helper(self, image_uris: List[str]) -> List[List[float]]:
# check if `embed_image()` API is supported by the embedding service used
if hasattr(self.embedding_service, "embed_image"):
try:
embeddings = self.embedding_service.embed_image(image_uris)
except Exception as e:
raise Exception(
f"Make sure your selected embedding model supports list of image URIs as input. {str(e)}"
)
else:
raise ValueError(
"Please use an embedding model that supports image embedding."
)
return embeddings

async def asimilarity_search_image(
self,
image_uri: str,
k: Optional[int] = None,
filter: Optional[str] = None,
**kwargs: Any,
) -> List[Document]:
"""Return docs selected by similarity search on query."""
embedding = self._images_embedding_helper([image_uri])[0]

return await self.asimilarity_search_by_vector(
embedding=embedding, k=k, filter=filter, **kwargs
)

def _select_relevance_score_fn(self) -> Callable[[float], float]:
"""Select a relevance function based on distance strategy."""
# Calculate distance strategy provided in
Expand Down Expand Up @@ -756,17 +822,6 @@ async def is_valid_index(
results = result_map.fetchall()
return bool(len(results) == 1)

def similarity_search(
self,
query: str,
k: Optional[int] = None,
filter: Optional[str] = None,
**kwargs: Any,
) -> List[Document]:
raise NotImplementedError(
"Sync methods are not implemented for AsyncAlloyDBVectorStore. Use AlloyDBVectorStore interface instead."
)

def add_texts(
self,
texts: Iterable[str],
Expand All @@ -788,6 +843,17 @@ def add_documents(
"Sync methods are not implemented for AsyncAlloyDBVectorStore. Use AlloyDBVectorStore interface instead."
)

def add_images(
self,
uris: List[str],
metadatas: Optional[List[dict]] = None,
ids: Optional[List[str]] = None,
**kwargs: Any,
) -> List[str]:
raise NotImplementedError(
"Sync methods are not implemented for AsyncAlloyDBVectorStore. Use AlloyDBVectorStore interface instead."
)

def delete(
self,
ids: Optional[List] = None,
Expand Down Expand Up @@ -838,6 +904,28 @@ def from_documents( # type: ignore[override]
"Sync methods are not implemented for AsyncAlloyDBVectorStore. Use AlloyDBVectorStore interface instead."
)

def similarity_search(
self,
query: str,
k: Optional[int] = None,
filter: Optional[str] = None,
**kwargs: Any,
) -> List[Document]:
raise NotImplementedError(
"Sync methods are not implemented for AsyncAlloyDBVectorStore. Use AlloyDBVectorStore interface instead."
)

def similarity_search_image(
self,
image_uri: str,
k: Optional[int] = None,
filter: Optional[str] = None,
**kwargs: Any,
) -> List[Document]:
raise NotImplementedError(
"Sync methods are not implemented for AsyncAlloyDBVectorStore. Use AlloyDBVectorStore interface instead."
)

def similarity_search_with_score(
self,
query: str,
Expand Down
48 changes: 48 additions & 0 deletions src/langchain_google_alloydb_pg/vectorstore.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,18 @@ async def aadd_documents(
self.__vs.aadd_documents(documents, ids, **kwargs)
)

async def aadd_images(
self,
uris: List[str],
metadatas: Optional[List[dict]] = None,
ids: Optional[List[str]] = None,
**kwargs: Any,
) -> List[str]:
"""Embed images and add to the table."""
return await self._engine._run_as_async(
self.__vs.aadd_images(uris, metadatas, ids, **kwargs)
)

def add_embeddings(
self,
texts: Iterable[str],
Expand Down Expand Up @@ -270,6 +282,18 @@ def add_documents(
self.__vs.aadd_documents(documents, ids, **kwargs)
)

def add_images(
self,
uris: List[str],
metadatas: Optional[List[dict]] = None,
ids: Optional[List[str]] = None,
**kwargs: Any,
) -> List[str]:
"""Embed images and add to the table."""
return self._engine._run_as_sync(
self.__vs.aadd_images(uris, metadatas, ids, **kwargs)
)

async def adelete(
self,
ids: Optional[List] = None,
Expand Down Expand Up @@ -589,6 +613,18 @@ def similarity_search(
self.__vs.asimilarity_search(query, k, filter, **kwargs)
)

def similarity_search_image(
self,
image_uri: str,
k: Optional[int] = None,
filter: Optional[str] = None,
**kwargs: Any,
) -> List[Document]:
"""Return docs selected by similarity search on image."""
return self._engine._run_as_sync(
self.__vs.asimilarity_search_image(image_uri, k, filter, **kwargs)
)

async def asimilarity_search(
self,
query: str,
Expand All @@ -601,6 +637,18 @@ async def asimilarity_search(
self.__vs.asimilarity_search(query, k, filter, **kwargs)
)

async def asimilarity_search_image(
self,
image_uri: str,
k: Optional[int] = None,
filter: Optional[str] = None,
**kwargs: Any,
) -> List[Document]:
"""Return docs selected by similarity search on query."""
return await self._engine._run_as_async(
self.__vs.asimilarity_search(image_uri, k, filter, **kwargs)
)

# Required for (a)similarity_search_with_relevance_scores
def _select_relevance_score_fn(self) -> Callable[[float], float]:
"""Select a relevance function based on distance strategy."""
Expand Down
2 changes: 1 addition & 1 deletion src/langchain_google_alloydb_pg/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@
# See the License for the specific language governing permissions and
# limitations under the License.

__version__ = "0.6.0"
__version__ = "0.7.0"
57 changes: 56 additions & 1 deletion tests/test_async_vectorstore.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,13 @@

import os
import uuid
from typing import Sequence
from typing import List, Sequence

import pytest
import pytest_asyncio
from langchain_core.documents import Document
from langchain_core.embeddings import DeterministicFakeEmbedding
from PIL import Image
from sqlalchemy import text
from sqlalchemy.engine.row import RowMapping

Expand All @@ -29,6 +30,7 @@
DEFAULT_TABLE = "test_table" + str(uuid.uuid4())
DEFAULT_TABLE_SYNC = "test_table_sync" + str(uuid.uuid4())
CUSTOM_TABLE = "test-table-custom" + str(uuid.uuid4())
IMAGE_TABLE = "test_image_table" + str(uuid.uuid4())
VECTOR_SIZE = 768

embeddings_service = DeterministicFakeEmbedding(size=VECTOR_SIZE)
Expand All @@ -42,6 +44,15 @@
embeddings = [embeddings_service.embed_query(texts[i]) for i in range(len(texts))]


class FakeImageEmbedding(DeterministicFakeEmbedding):

def embed_image(self, image_paths: List[str]) -> List[List[float]]:
return [self.embed_query(path) for path in image_paths]


image_embedding_service = FakeImageEmbedding(size=VECTOR_SIZE)


def get_env_var(key: str, desc: str) -> str:
v = os.environ.get(key)
if v is None:
Expand Down Expand Up @@ -133,6 +144,38 @@ async def vs_custom(self, engine):
)
yield vs

@pytest_asyncio.fixture(scope="class")
async def image_vs(self, engine):
await engine._ainit_vectorstore_table(
IMAGE_TABLE,
VECTOR_SIZE,
metadata_columns=[Column("image_id", "TEXT"), Column("source", "TEXT")],
)
vs = await AsyncAlloyDBVectorStore.create(
engine,
embedding_service=image_embedding_service,
table_name=IMAGE_TABLE,
metadata_columns=["image_id", "source"],
metadata_json_column="mymeta",
)
yield vs

@pytest_asyncio.fixture(scope="class")
async def image_uris(self):
red_uri = str(uuid.uuid4()).replace("-", "_") + "test_image_red.jpg"
green_uri = str(uuid.uuid4()).replace("-", "_") + "test_image_green.jpg"
blue_uri = str(uuid.uuid4()).replace("-", "_") + "test_image_blue.jpg"
image = Image.new("RGB", (100, 100), color="red")
image.save(red_uri)
image = Image.new("RGB", (100, 100), color="green")
image.save(green_uri)
image = Image.new("RGB", (100, 100), color="blue")
image.save(blue_uri)
image_uris = [red_uri, green_uri, blue_uri]
yield image_uris
for uri in image_uris:
os.remove(uri)

async def test_init_with_constructor(self, engine):
with pytest.raises(Exception):
AsyncAlloyDBVectorStore(
Expand Down Expand Up @@ -192,6 +235,18 @@ async def test_aadd_docs_no_ids(self, engine, vs):
assert len(results) == 3
await aexecute(engine, f'TRUNCATE TABLE "{DEFAULT_TABLE}"')

async def test_aadd_images(self, engine, image_vs, image_uris):
ids = [str(uuid.uuid4()) for i in range(len(image_uris))]
metadatas = [
{"image_id": str(i), "source": "google.com"} for i in range(len(image_uris))
]
await image_vs.aadd_images(image_uris, metadatas, ids)
results = await afetch(engine, (f'SELECT * FROM "{IMAGE_TABLE}"'))
assert len(results) == 3
assert results[0]["image_id"] == "0"
assert results[0]["source"] == "google.com"
await aexecute(engine, (f'TRUNCATE TABLE "{IMAGE_TABLE}"'))

async def test_adelete(self, engine, vs):
ids = [str(uuid.uuid4()) for i in range(len(texts))]
await vs.aadd_texts(texts, ids=ids)
Expand Down
Loading

0 comments on commit 8d68966

Please sign in to comment.