vector-similarity-w-vertex-ai

end-to-end visual similarity solution using Vertex Matching Engine

01-prepare-vpc-network.ipynb - steps for creating the required VPC network config
02-retail-dataset-prep.ipynb instructions for downloading and preparing the Retail Product Dataset
03-pipeline-workflows.ipynb - orchestrate e2e workflow with Vertex Pipelines
04-vector-filtering.ipynb - examples of filtering and crowding when querying index

TODOs:

(1) Multi-modal features e.g., image and text (product title, description) for embeddings

Getting value out of unstructured data with embeddings

Neural deep retrieval (NDR) is a popular technique for representing the relationships between multiple entities, and indexing these entities for efficient retrieval
With applications across a variety of use cases such as multimodal search, item matching, ad targeting, customer segmentation, recommendations, and more, it's a valuable capability many organizations have prioritized.

See why-ann-index.md for a refresher on NDR and ANN indexes

Repo Objectives

Use a pretrained deep learning model to extract feature vectors (embeddings) from each image in a retail product catalog
Store embedding vectors in a scalable approximate nearest neighbor (ANN) index, e.g., Vertex Matching Engine, where each image's embedding vectors are indexed by product ID
For a given query image, call model.predict(x) with the same pretrained model used in (1) to extract the feature vectors (embeddings) from the query image
Using the computed feature vectors from (3), query the Matching Engine Index to find the k nearest neighbors

Deployment pipelines

Load / pre-process images

decode
reshape per model specs
Convert tensor to float & add axis for expected model input (e.g., 1 x 224 x 224 x 3)

Extract Feature Vectors

Load pre-trained image model (TF Hub)
Loop through images & calculate feature vectors (embeddings)
Save vectors to file in Cloud Storage

Build Matching Engine Index

Setup VPC Network Peering Connection
NearestNeighborSearchConfig e.g., dimensions, approximateNeighborsCount, distanceMeasureType, etc.

Feature Extraction

Pre-trained models trained on larger datasets can be a good starting point.
If the original dataset is large and general enough, the spatial hierarchy of features learned by the pretrained network can effictevly act as a generic modelof the vidual world
They can be useful even if the image classes are completely different between the original and target dataset
Feature Extraction consists of taking the convolutional base of a previously trained network, running new data through it, and training a new classifier on top of the output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vector-similarity-w-vertex-ai

TODOs:

Getting value out of unstructured data with embeddings

Repo Objectives

Deployment pipelines

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
imgs		imgs
01-prepare-vpc-network.ipynb		01-prepare-vpc-network.ipynb
02-retail-dataset-prep.ipynb		02-retail-dataset-prep.ipynb
03-pipeline-workflows.ipynb		03-pipeline-workflows.ipynb
04-vector-filtering.ipynb		04-vector-filtering.ipynb
README.md		README.md
why-ann-index.md		why-ann-index.md

muratuysal/vector-similarity-w-vertex-ai

Folders and files

Latest commit

History

Repository files navigation

vector-similarity-w-vertex-ai

TODOs:

Getting value out of unstructured data with embeddings

Repo Objectives

Deployment pipelines

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages