The primary objectives of this recommendation engine to demonstrate a semantic search capabilities using the aidb
extension, with the intent of demonstrating the extension's ease of implementation and its capability to abstract complexities without compromising functionality.
The objective of this experiment is to leverage the CLIP model in conjunction with PostgreSQL, applying the aidb
extension execute transformation and semantic search in an automated fashion withing database. This setup involves a dataset of Images of catalog that will allow reverse image search too.
aidb is EDB Postgres AI database extension and it should be installed. If not installed please install it by following the step by step installation guide in the following link: https://www.enterprisedb.com/docs/edb-postgres-ai/ai-ml/install-tech-preview/
Instead of storing the images directly in the database, we store them in a public S3 Bucket. The actual data stored in the database consists of image embeddings, which are generated by the CLIP model and encapsulated in 512-dimensional vectors as required by the model. This approach enables rapid search capabilities on a standard laptop.
The demo is not only do reverse image search it also showes text to image search, searching on catalog passing text as input.
Download and unzip the dataset from https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-small/download?datasetVersionNumber=1
into a folder like following dataset/images
Run pip install from EDB Python directory as: pip install -r requirements.txt
Python Environment: The Python environment accessible to PostgreSQL should have the necessary libraries installed:
You can run aidb recommendation app by running the below python script to initialize the database and to load the data. Or you can run the aidb queries inside Postgres Terminal.
The images should be stored into that S3 bucket to run the python script. S3 endpoint is optional leave blank if the s3 bucket is not public. Then you should pass the name of the S3 bucket name as an argument like in below;
% python code/connect_encode.py retriver_name s3_bucket_name s3_endpoint
% python code/connect_encode.py recommendation_engine public-ai-team http://s3.eu-central-1.amazonaws.com
postgres=# SELECT aidb.create_s3_retriever(
'recommendation_engine',
'public',
'clip-vit-base-patch32',
'img',
'public-ai-team',
'',
'http://s3.eu-central-1.amazonaws.com'
);
postgres=# SELECT aidb.refresh_retriever('recommendation_engine');
Change the db connection with the necessary port, username, password from create_db_connection
function and DATABASE_URL
variable.
To run with aidb use the below code. s3_endpoint is optional Streamlit doesn't natively support command-line arguments in the same way as typical Python scripts. Therefore enter single quoted empty string '' if the s3 bucket is not public. Otherwise run the script as demonstrated in the below example;
% streamlit run code/app_search_aidb.py retriever_name s3_bucket_name s3_endpoint
Example if the S3 bucket is public ;
% streamlit run code/app_search_aidb.py recommendation_engine public-ai-team http://s3.eu-central-1.amazonaws.com
Example search texts : red shoes, red women shoes, black dress....