diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/gettingstarted/index.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/gettingstarted/index.mdx index abe67250290..98a0656a699 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/gettingstarted/index.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/gettingstarted/index.mdx @@ -58,14 +58,14 @@ INSERT 0 9 So now we have a table with some data in it, food products and some very personal opinions about them. -## Registering a Retriever +## Creating a Retriever -The first step to using Pipelines with this data is to register a retriever. A retriever is a way to access the data in the table and use it in AI workflows. +The first step to using Pipelines with this data is to create a retriever. A retriever is a way to access the data in the table and use it in AI workflows. ```sql -select aidb.register_retriever_for_table('products_retriever', 't5', 'products', 'description', 'Text'); +select aidb.create_retriever_for_table('products_retriever', 't5', 'products', 'description', 'Text'); __OUTPUT__ - register_retriever_for_table + create_retriever_for_table ------------------------------ products_retriever (1 row) @@ -73,7 +73,7 @@ __OUTPUT__ ## Querying the retriever -Now that we have a retriever registered, we can query it to get similar results based on the data in the table. +Now that we have created a retriever, we can query it to get similar results based on the data in the table. ```sql select * from aidb.retrieve_key('products_retriever','I like it',5); diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/bert.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/bert.mdx index f483e7f3a5d..6471449872b 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/bert.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/bert.mdx @@ -31,17 +31,18 @@ Read more about [BERT on Wikipedia](https://en.wikipedia.org/wiki/BERT_(language * sentence-transformers/paraphrase-multilingual-mpnet-base-v2 * sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 -## Register the default implementation - +## Creating the default model ```sql -SELECT aidb.register_model('my_bert_model', 'bert_local'); +SELECT aidb.create_model('my_bert_model', 'bert_local'); ``` -## Register another model +## Creating a specific model + +You can specify a model and revision in the options JSONB object. In this example, we are creating a `sentence-transformers/all-distilroberta-v1` model with the name `another_bert_model`: ```sql -select aidb.register_model( +select aidb.create_model( 'another_bert_model', 'bert_local', '{"model": "sentence-transformers/all-distilroberta-v1", "revision": "main"}'::JSONB diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/clip.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/clip.mdx index 124e26e51fa..4881e3bf0a3 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/clip.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/clip.mdx @@ -24,15 +24,15 @@ Read more about [CLIP on OpenAI's website](https://openai.com/research/clip/). * openai/clip-vit-base-patch32 (default) -## Register the default implementation +## Creating the default model ```sql -SELECT aidb.register_model('my_clip_model', 'clip_local'); +SELECT aidb.create_model('my_clip_model', 'clip_local'); ``` There is only one model, the default `openai/clip-vit-base-patch32`, so we do not need to specify the model in the configuration. No credentials are required for the CLIP model. -## Register another model +## Creating a specific model There are no other model configurations available for the CLIP model. diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/openai-completions.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/openai-completions.mdx index b4172289534..068d99cf513 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/openai-completions.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/openai-completions.mdx @@ -21,18 +21,18 @@ See a list of supported OpenAI models [here](https://platform.openai.com/docs/mo * Any text generation model that is supported by OpenAI. This includes models such as GPT-4o, GPT-4o mini, GPT-4 and GPT-3.5. -## Registering the default model +## Creating the default model -There is no default model for OpenAI Completions. You can register any supported OpenAI model using the `aidb.register_model` function. See [Registering a model](#registering-a-model). +There is no default model for OpenAI Completions. You can register any supported OpenAI model using the `aidb.create_model` function. See [Crating a model](#creating-a-specific-model). -## Registering a model +## Creating a specific model -You can register any supported OpenAI model using the `aidb.register_model` function. +You can register any supported OpenAI model using the `aidb.create_model` function. In this example, we are registering a GPT-4o model with the name `my_openai_model`: ```sql -SELECT aidb.register_model( +SELECT aidb.create_model( 'my_openai_model', 'openai_completions', '{"model": "gpt-4o"}::JSONB, diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/openai-embeddings.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/openai-embeddings.mdx index 50a0369b8ef..4a18cdd522d 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/openai-embeddings.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/openai-embeddings.mdx @@ -22,22 +22,22 @@ See a list of supported OpenAI models [here](https://platform.openai.com/docs/gu * Any text embedding model that is supported by OpenAI. This includes `text-embedding-3-small`, `text-embedding-3-large`, and `text-embedding-ada-002`. * Defaults to `text-embedding-3-small`. -## Registering the default model +## Creating the default model ```sql -SELECT aidb.register_model('my_openai_embeddings', +SELECT aidb.create_model('my_openai_embeddings', 'openai_embeddings', credentials=>'{"api_key": "sk-abc123xyz456def789ghi012jkl345mn"'::JSONB); ``` As we are defaulting the model to `text-embedding-3-small`, we do not need to specify the model in the configuration. But we do need to pass an OpenAI API key in the credentials, and for that we have to pass credentials as a named parameter. -## Registering a model +## Creating a specific model -You can register any supported OpenAI embedding model using the `aidb.register_model` function. In this example, we are registering a `text-embedding-3-large` model with the name `my_openai_model`: +You can create any supported OpenAI embedding model using the `aidb.create_model` function. In this example, we are creating a `text-embedding-3-large` model with the name `my_openai_model`: ```sql -SELECT aidb.register_model( +SELECT aidb.create_model( 'my_openai_model', 'openai_embeddings', '{"model": "text-embedding-3-large"}'::JSONB, @@ -55,6 +55,20 @@ The following configuration settings are available for OpenAI models: * `url` - The URL of the OpenAI model to use. This is optional and can be used to specify a custom model URL. Defaults to `https://api.openai.com/v1/chat/completions`. * `max_concurrent_requests` - The maximum number of concurrent requests to make to the OpenAI model. Defaults to `25`. +## Available OpenAI Embeddings models + +* sentence-transformers/all-MiniLM-L6-v2 (default) +* sentence-transformers/all-MiniLM-L6-v1 +* sentence-transformers/all-MiniLM-L12-v1 +* sentence-transformers/msmarco-bert-base-dot-v5 +* sentence-transformers/multi-qa-MiniLM-L6-dot-v1 +* sentence-transformers/paraphrase-TinyBERT-L6-v2 +* sentence-transformers/all-distilroberta-v1 +* sentence-transformers/all-MiniLM-L6-v2 +* sentence-transformers/multi-qa-MiniLM-L6-cos-v1 +* sentence-transformers/paraphrase-multilingual-mpnet-base-v2 +* sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 + ## Model credentials The following credentials are required for OpenAI models: diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/t5.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/t5.mdx index 5c6bbf20900..9a65671a977 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/t5.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/t5.mdx @@ -27,16 +27,16 @@ Read more about [T5 on Wikipedia](https://en.wikipedia.org/wiki/T5_(language_mod * t5-3b * t5-11b -## Registering the default model +## Creating the default model ```sql -SELECT aidb.register_model('my_t5_model', 't5_local'); +SELECT aidb.create_model('my_t5_model', 't5_local'); ``` -## Registering a specific model +## Creating a specific model ```sql -SELECT aidb.register_model( +SELECT aidb.create_model( 'another_t5_model', 't5_local', '{"model": "t5-large", "revision": "main"}'::JSONB diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/using-models.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/using-models.mdx index 188a0584147..0e3d1940d5d 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/using-models.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/using-models.mdx @@ -1,19 +1,19 @@ --- title: Using Models in AI Accelerator Pipelines navTitle: Using Models -description: How to register and use models in AI Accelerator Pipelines. +description: How to create and use models in AI Accelerator Pipelines. --- -Pipelines has a model registry that manages configured instances of models. Any Pipelines functions that use models, such as embedding and retrieving, must reference a registered model. +Pipelines has a model registry that manages configured instances of models. Any Pipelines functions that use models, such as embedding and retrieving, must reference a model in this registry. ## Discover the preloaded models -Pipelines comes with a set of pre-registerd models that you can use out of the box. +Pipelines comes with a set of pre-created models that you can use out of the box. To find them, you can run the following query: ```sql -SELECT * FROM aidb.list_registered_models(); +SELECT * FROM aidb.list_models(); ``` This will return a list of all the models that are currently registered in the system. If you have not registered any models, you'll see the default models that come with Pipelines. @@ -29,36 +29,56 @@ This will return a list of all the models that are currently registered in the s The `bert`, `clip`, and `t5` models are all registered and ready to use. The `dummy` model is a placeholder model that can be used for testing purposes. -## Registering a Model +## Creating a Model -You can also register your own models. To do this, you can use the `aidb.register_model` function. Here is an example of how to register a model: +You can also create your own models. To do this, you can use the `aidb.create_model` function. Here is an example of how to create a model: ```sql -SELECT aidb.register_model('my_model', 'bert_local'); +SELECT aidb.create_model('my_model', 'bert_local'); ``` -This will register a model named `my_model` that uses the default `bert_local` model provider. But, this is essentially the same as using the bert model thats already registered. +This will create a model named `my_model` that uses the default `bert_local` model provider. But, this is essentially the same as using the bert model thats already registered. -## Registering a Model with a Configuration +## Discovering the Model Providers + +You can also find out what model providers are available by running the following query: + +```sql +SELECT * FROM aidb.model_providers; +__OUTPUT__ + server_name | server_options +--------------------+---------------- + t5_local | + openai_embeddings | + openai_completions | + bert_local | + clip_local | + dummy | +``` + +This will return a list of all the model providers that are currently available in the system. You can find out more about these providers and their capabilities in the [Supported Models](./supported-models) section. + +## Creating a Model with a Configuration You can also pass options to the model when registering it. For example, you can specify the model configuration: ```sql -SELECT aidb.register_model('my_model', +SELECT aidb.create_model('my_model', 'bert_local', - '{"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2", "revision": "main"}'::JSONB); + '{"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2", + "revision": "main"}'::JSONB); ``` -This will register a model named `my_model` that uses the `bert_local` model provider and the `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` model from HuggingFace. The `revision` option specifies the version of the model to use. The options are passed as a JSONB object, with a single quoted string that is then cast to JSONB. Within the string are the key-value pairs that define the model configuration in a single JSON object. +This will create a model named `my_model` that uses the `bert_local` model provider and the `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` model from HuggingFace. The `revision` option specifies the version of the model to use. The options are passed as a JSONB object, with a single quoted string that is then cast to JSONB. Within the string are the key-value pairs that define the model configuration in a single JSON object. ## Registering a Model with Configuration and Credentials -This is where the other [supported models](./supported-models) come in. You can register a different model by specifying the model name in the configuration. The `OpenAI Completions` and `OpenAI Embeddings` models are both models which you can register to make use of OpenAI's completions and embeddings APIs. +This is where the other [supported models](./supported-models) come in. You can create a different model by specifying the model name in the configuration. The `OpenAI Completions` and `OpenAI Embeddings` models are both models which you can create to make use of OpenAI's completions and embeddings APIs. -You need to provide more information to the `aidb.register_model` function when registering a model like these. Completions has a number of options, including selecting which model it will use on OpenAI. Both Completions and Embeddings requires API credentials. Here is an example of how to register the OpenAI Completions model: +You need to provide more information to the `aidb.create_model` function when registering a model like these. Completions has a number of options, including selecting which model it will use on OpenAI. Both Completions and Embeddings requires API credentials. Here is an example of how to create the OpenAI Completions model: ```sql -SELECT aidb.register_model( +SELECT aidb.create_model( 'my_openai_model', 'openai_completions', '{"model": "gpt-4o"}'::JSONB, @@ -68,10 +88,10 @@ SELECT aidb.register_model( You should replace the `api_key` value with your own OpenAI API key. Now you can use the `my_openai_model` model in your Pipelines functions and, in this example, leverage the GPT-4o model from OpenAI. -You can also register the OpenAI Embeddings model in a similar way. +You can also create the OpenAI Embeddings model in a similar way. ```sql -SELECT aidb.register_model( +SELECT aidb.create_model( 'my_openai_embeddings', 'openai_embeddings', '{"model": "text-embedding-3-large"}'::JSONB, @@ -79,11 +99,11 @@ SELECT aidb.register_model( }; ``` -This will register the `text-embedding-3-large` model with the name `my_openai_embeddings`. You can now use this model in your Pipelines functions to generate embeddings for text data. +This will create the `text-embedding-3-large` model with the name `my_openai_embeddings`. You can now use this model in your Pipelines functions to generate embeddings for text data. ## Using models with OpenAI compatible APIs -These OpenAI models work with any OpenAI compatible API. This allows you to connect and use an even wider range of models, just by passing the appropriate API endpoint to the `url` option in the `aidb.register_model` function's options. +These OpenAI models work with any OpenAI compatible API. This allows you to connect and use an even wider range of models, just by passing the appropriate API endpoint to the `url` option in the `aidb.create_model` function's options. For more information about the OpenAI models, see the [OpenAI Completions](./supported-models/openai-completions) and [OpenAI Embeddings](./supported-models/openai-embeddings) pages. diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/pipelines-overview.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/pipelines-overview.mdx index 91916ed7ca6..b9d70341aeb 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/pipelines-overview.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/pipelines-overview.mdx @@ -25,7 +25,7 @@ Pipelines delivers its functionality through the Pipelines aidb extension, embed Pipelines' aidb extension introduces the concept of a “retriever” that you can create for a given type and location of AI data. Currently, Pipelines supports unstructured plain text documents as well as a set of image formats. This data can either reside in regular columns of a Postgres table or it can reside in an S3-compatible object storage bucket. -A retriever encapsulates all processing that is needed to make the AI data in the provided source location searchable and retrievable through similarity. The application just needs to create a retriever via the `aidb.register_retriever_for_table()` function for Postgres tables or `aidb.register_retriever_for_volume` for externally stored data on S3 or local filesystems. +A retriever encapsulates all processing that is needed to make the AI data in the provided source location searchable and retrievable through similarity. The application just needs to create a retriever via the `aidb.create_retriever_for_table()` function for Postgres tables or `aidb.create_retriever_for_volume` for externally stored data on S3 or local filesystems. ### Auto embedding diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/index.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/index.mdx index b2aef087a57..6c6ebfc947e 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/index.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/index.mdx @@ -16,10 +16,10 @@ navigation: * [aidb.model_providers](models#aidbmodel_providers) ### Functions -* [aidb.register_model](models#aidbregister_model) -* [aidb.list_registered_models](models#aidblist_registered_models) -* [aidb.get_registered_model](models#aidbget_registered_model) -* [aidb.delete_registered_model](models#aidbdelete_registered_model) +* [aidb.create_model](models#aidbcreate_model) +* [aidb.list_models](models#aidblist_models) +* [aidb.get_model](models#aidbget_model) +* [aidb.delete_model](models#aidbdelete_model) ## Retrievers @@ -29,8 +29,8 @@ navigation: ### Functions -* [aidb.register_retriever_for_table](retrievers#aidbregister_retriever_for_table) -* [aidb.register_retriever_for_volume](retrievers#aidbregister_retriever_for_volume) +* [aidb.create_retriever_for_table](retrievers#aidbcreate_retriever_for_table) +* [aidb.create_retriever_for_volume](retrievers#aidbcreate_retriever_for_volume) * [aidb.enable_auto_embedding_for_table](retrievers#aidbenable_auto_embedding_for_table) * [aidb.disable_auto_embedding_for_table](retrievers#aidbdisable_auto_embedding_for_table) * [aidb.bulk_embedding](retrievers#aidbbulk_embedding) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/models.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/models.mdx index b3211983ab6..e54aff79f3e 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/models.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/models.mdx @@ -21,9 +21,9 @@ The `aidb.model_providers` table stores information about the model providers th ## Functions -### `aidb.register_model` +### `aidb.create_model` -Registers a a new model in the system by saving its name, provider and optional configuration. +Creates a new model in the system by saving its name, provider and optional configuration. #### Parameters @@ -38,7 +38,7 @@ Registers a a new model in the system by saving its name, provider and optional #### Example ```sql -SELECT aidb.register_model( +SELECT aidb.create_model( name => 'my_t5'::text, provider => 't5_local'::character varying, config => '{"param1": "value1", "param2": "value2"}'::jsonb, @@ -49,12 +49,12 @@ SELECT aidb.register_model( or equivalently, using default values: ```sql -SELECT aidb.register_model('my_t5', 't5_local'); +SELECT aidb.create_model('my_t5', 't5_local'); ``` -### `aidb.list_registered_models` +### `aidb.list_models` -Returns a list of all registered models and their configured options. +Returns a list of all models in the registry and their configured options. #### Parameters @@ -71,7 +71,7 @@ None #### Example ```sql -SELECT * FROM aidb.list_registered_models(); +SELECT * FROM aidb.list_models(); __OUTPUT__ name | provider | options -------+------------+--------------- @@ -80,9 +80,9 @@ __OUTPUT__ t5 | t5_local | {"config={}"} ``` -### `aidb.get_registered_model` +### `aidb.get_model` -Returns the configuration for a registered model. +Returns the configuration for a model in the registry. #### Parameters @@ -101,7 +101,7 @@ Returns the configuration for a registered model. #### Example ```sql -SELECT * FROM aidb.get_registered_model('t5'); +SELECT * FROM aidb.getmodel('t5'); __OUTPUT__ name | provider | options ------+----------+--------------- @@ -109,9 +109,9 @@ __OUTPUT__ (1 row) ``` -### `aidb.delete_registered_model` +### `aidb.delete_model` -Deletes a registered model. +Deletes a model from the registry. #### Parameters @@ -122,9 +122,9 @@ Deletes a registered model. #### Example ```sql -SELECT aidb.delete_registered_model('t5'); +SELECT aidb.delete_model('t5'); __OUTPUT__ - delete_registered_model + delete_model --------------------------------- (t5,t5_local,"{""config={}""}") (1 row) @@ -134,7 +134,7 @@ __OUTPUT__ | Column | Type | Description | |---------------------------|-------|----------------------------------------------------------| -| `delete_registered_model` | jsonb | The name, provider and options of the now deleted model. | +| `delete_model` | jsonb | The name, provider and options of the now deleted model. | ### `aidb.encode_text` diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/retrievers.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/retrievers.mdx index e40f047a628..4e6b82c5c65 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/retrievers.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/retrievers.mdx @@ -25,11 +25,11 @@ The `aidb.retrievers` view shows information about the retrievers that have been | distance_operator | [aidb.DistanceOperator](#aidbdistanceoperator) | During retrieval, what vector operation should be used to compare the vectors. | | options | jsonb | Currently unused. | | source_type | text | Type of source data the retriever is working with. Can be either 'Table' or 'Volume'. | -| source_table_name | regclass | name of the table that has the source data we compute embeddings for, and that we retrieve from. Only applicable to retrievers configured with aidb.register_retriever_for_volume(). | +| source_table_name | regclass | name of the table that has the source data we compute embeddings for, and that we retrieve from. Only applicable to retrievers configured with aidb.create_retriever_for_table(). | | source_table_data_column | text | column name in the source table that we compute embeddings for. This is also the column that will be returned in retrieve operations. | | source_table_data_column_type | [aidb.RetrieverSourceDataFormat](#aidbretrieversourcedataformat) | Type of data the retriever working with. Uses type [`aidb.RetrieverSourceDataFormat`](#aidbretrieversourcedataformat). Only relevant for table based retrievers. In the case of a volume based retriever, the format/type information is discovered from the volume. | | source_table_key_column | text | column to use as key for storing the embedding in the vector table. This provides a reference from the embedding to the source data | -| source_volume_name | text | Name of the volume to use as a data source. Only applicable to retrievers configured with aidb.register_retriever_for_volume(). | +| source_volume_name | text | Name of the volume to use as a data source. Only applicable to retrievers configured with aidb.create_retriever_for_volume(). | ## Types @@ -81,64 +81,64 @@ CREATE TYPE RetrieverSourceDataFormat AS ENUM ( ## Functions -### `aidb.register_retriever_for_table` +### `aidb.create_retriever_for_table` -Registers a retriever for a given table. +Creates a retriever for a given table. #### Parameters -| Parameter | Type | Default | Description | -|---------------------------------|------------------------------------------------------------------|--------------|----------------------------------------------------| -| p_name | TEXT | Required | Name of the retriever | -| p_model_name | TEXT | Required | Name of the registered model to use | -| p_source_table_name | regclass | Required | Name of the table to use as source | -| p_source_table_data_column | TEXT | Required | Column name in source table to use | -| p_source_table_data_column_type | [aidb.RetrieverSourceDataFormat](#aidbretrieversourcedataformat) | Required | Type of data in that column ("Text"."Image","PDF") | -| p_source_table_key_column | TEXT | 'id' | Column to use as key to reference the rows | -| p_vector_table_name | TEXT | NULL | | -| p_vector_table_vector_column | TEXT | 'embeddings' | | -| p_vector_table_key_column | TEXT | 'id' | | -| p_topk | INTEGER | 1 | | -| p_distance_operator | [aidb.distanceoperator](#aidbdistanceoperator) | 'L2' | | -| p_options | JSONB | '{}'::JSONB | Options | +| Parameter | Type | Default | Description | +|--------------------|------------------------------------------------------------------|--------------|----------------------------------------------------| +| name | TEXT | Required | Name of the retriever | +| model_name | TEXT | Required | Name of the registered model to use | +| source_table | regclass | Required | Name of the table to use as source | +| source_data_column | TEXT | Required | Column name in source table to use | +| source_data_type | [aidb.RetrieverSourceDataFormat](#aidbretrieversourcedataformat) | Required | Type of data in that column ("Text"."Image","PDF") | +| source_key_column | TEXT | 'id' | Column to use as key to reference the rows | +| vector_table | TEXT | NULL | | +| vector_data_column | TEXT | 'embeddings' | | +| vector_key_column | TEXT | 'id' | | +| topk | INTEGER | 1 | | +| distance_operator | [aidb.distanceoperator](#aidbdistanceoperator) | 'L2' | | +| options | JSONB | '{}'::JSONB | Options | #### Example ```sql -SELECT aidb.register_retriever_for_table( - p_name => 'test_retriever', - p_model_name => 'simple_model', - p_source_table_name => 'test_source_table', - p_source_table_data_column => 'content', - p_source_table_data_column_type => 'Text', +SELECT aidb.create_retriever_for_table( + name => 'test_retriever', + model_name => 'simple_model', + source_table => 'test_source_table', + source_data_column => 'content', + source_data_type => 'Text', ); ``` -### `aidb.register_retriever_for_volume` +### `aidb.create_retriever_for_volume` -Registers a retriever for a given PGFS volume. +Creates a retriever for a given PGFS volume. #### Parameters -| Parameter | Type | Default | Description | -|------------------------------|-----------------------|--------------|------------------------------| -| p_name | TEXT | Required | Name of the retriever. | -| p_model_name | TEXT | Required | Name of the model. | -| p_source_volume_name | TEXT | Required | Name of the volume. | -| p_vector_table_name | TEXT | NULL | Name of the vector table. | -| p_vector_table_vector_column | TEXT | 'embeddings' | Name of the vector column. | -| p_vector_table_key_column | TEXT | 'id' | Name of the key column. | -| p_topk | INTEGER | 1 | Number of results to return. | -| p_distance_operator | aidb.distanceoperator | 'L2' | Distance operator. | -| p_options | JSONB | '{}'::JSONB | Options. | +| Parameter | Type | Default | Description | +|--------------------|-----------------------|--------------|------------------------------| +| name | TEXT | Required | Name of the retriever. | +| model_name | TEXT | Required | Name of the model. | +| source_volume_name | TEXT | Required | Name of the volume. | +| vector_table | TEXT | NULL | Name of the vector table. | +| vector_data_column | TEXT | 'embeddings' | Name of the vector column. | +| vector_key_column | TEXT | 'id' | Name of the key column. | +| topk | INTEGER | 1 | Number of results to return. | +| distance_operator | aidb.distanceoperator | 'L2' | Distance operator. | +| options | JSONB | '{}'::JSONB | Options. | #### Example ```sql -SELECT aidb.register_retriever_for_volume( - p_name => 'demo_vol_retriever', - p_model_name => 'simple_model', - p_source_volume_name => 'demo_bucket_vol' +SELECT aidb.create_retriever_for_volume( + name => 'demo_vol_retriever', + model_name => 'simple_model', + source_volume_name => 'demo_bucket_vol' ); ``` @@ -150,7 +150,7 @@ Enables automatic embedding generation for a given table. | Parameter | Type | Default | Description | |---------------------------------|--------------------------------|--------------|-----------------------------------------------| -| p_name | TEXT | | Name of registered table which should have auto-embedding enabled.| +| retriever_name | TEXT | | Name of registered table which should have auto-embedding enabled.| #### Example @@ -166,7 +166,7 @@ Enables automatic embedding generation for a given table. | Parameter | Type | Default | Description | |---------------------------------|--------------------------------|--------------|-----------------------------------------------| -| p_name | TEXT | | Name of registered table which should have auto_embedding disabled.| +| retriever_name | TEXT | | Name of registered table which should have auto_embedding disabled.| #### Example diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/retrievers/example.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/retrievers/example.mdx index 5c56ba23bb5..86c9df4844b 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/retrievers/example.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/retrievers/example.mdx @@ -11,82 +11,74 @@ This is a full end-to-end example of using retrievers in EDB Postgres AI - AI Ac DROP EXTENSION aidb CASCADE; CREATE EXTENSION aidb CASCADE; -drop table if exists test_source_table cascade; -drop table if exists test_retriever_vector cascade; +drop table if exists test_source_table_ajz72eb cascade; +drop table if exists test_retriever_ajz72eb_vector cascade; --- Create source test table -CREATE TABLE test_source_table +-- Create source test table-- Create source test table +CREATE TABLE test_source_table_ajz72eb ( id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, content TEXT NOT NULL, unrelated_column TEXT ); -INSERT INTO test_source_table +INSERT INTO test_source_table_ajz72eb VALUES (43941, 'Catwalk Women Brown Heels'), (55018, 'Lakme 3 in 1 Orchid Aqua Shine Lip Color'), (19337, 'United Colors of Benetton Men Stripes Black Jacket'); -- Register model -SELECT aidb.register_model('simple_model', 'bert_local'); - -SELECT aidb.register_retriever_for_table( - p_name => 'test_retriever', - p_model_name => 'simple_model', - p_source_table_name => 'test_source_table', - p_source_table_data_column => 'content', - p_source_table_data_column_type => 'Text', - p_source_table_key_column => 'id', -- Default - p_vector_table_name => 'test_source_table_vector', -- Defaults to `source_table_name + '_vector'` - p_vector_table_vector_column => 'embeddings', -- Default - p_vector_table_key_column => 'id', -- Default - p_topk => 1, -- Default - p_distance_operator => 'L2', -- Default - p_options => '{}'::JSONB -- Default +SELECT aidb.create_model('simple_model_ajz72eb', 'bert_local'); + +SELECT aidb.create_retriever_for_table( + name => 'test_retriever_ajz72eb', + model_name => 'simple_model_ajz72eb', + source_table => 'test_source_table_ajz72eb', + source_data_column => 'content', + source_data_type => 'Text' ); -- expect "Table" -SELECT aidb.get_retriever_data_source('test_retriever'); +SELECT aidb.get_retriever_data_source('test_retriever_ajz72eb'); SELECT * FROM aidb.retrievers; -SELECT aidb.bulk_embedding('test_retriever'); +SELECT aidb.bulk_embedding('test_retriever_ajz72eb'); -- Perform retrieval similarity search for the closest `key` -SELECT * FROM aidb.retrieve_key('test_retriever', 'orchid'); -SELECT * FROM aidb.retrieve_key('test_retriever', 'orchid', 2); -- Limit to top 2 results +SELECT * FROM aidb.retrieve_key('test_retriever_ajz72eb', 'orchid'); +SELECT * FROM aidb.retrieve_key('test_retriever_ajz72eb', 'orchid', 2); -- Limit to top 2 results -SELECT * FROM aidb.retrieve_text('test_retriever', 'orchid'); -SELECT * FROM aidb.retrieve_text('test_retriever', 'orchid', 2); -- Limit to top 2 results +SELECT * FROM aidb.retrieve_text('test_retriever_ajz72eb', 'orchid'); +SELECT * FROM aidb.retrieve_text('test_retriever_ajz72eb', 'orchid', 2); -- Limit to top 2 results -- enable the auto embedding -SELECT aidb.enable_auto_embedding_for_table('test_retriever'); +SELECT aidb.enable_auto_embedding_for_table('test_retriever_ajz72eb'); -- add additional data to test auto-embedding -INSERT INTO test_source_table +INSERT INTO test_source_table_ajz72eb VALUES (11211, 'Bicycle'), (11311, 'What is this?'), (11411, 'Elephants'); -- check embeddings -SELECT id FROM test_retriever_vector; +SELECT id FROM test_retriever_ajz72eb_vector; -- delete one of the source rows -DELETE FROM test_source_table WHERE id = 11211; +DELETE FROM test_source_table_ajz72eb WHERE id = 11211; -- check embeddings -SELECT id FROM test_retriever_vector; +SELECT id FROM test_retriever_ajz72eb_vector; -- enable the auto embedding -SELECT aidb.disable_auto_embedding_for_table('test_retriever'); -INSERT INTO test_source_table VALUES (212121, 'new value'); +SELECT aidb.disable_auto_embedding_for_table('test_retriever_ajz72eb'); +INSERT INTO test_source_table_ajz72eb VALUES (212121, 'new value'); - -select aidb.delete_retriever('test_retriever'); +select aidb.delete_retriever('test_retriever_ajz72eb'); ``` diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/retrievers/usage.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/retrievers/usage.mdx index cd72fc1e5ae..3ae42ad5806 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/retrievers/usage.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/retrievers/usage.mdx @@ -11,35 +11,35 @@ There are two types of retrievers: table and volume. Given the different nature ## Retriever for a table data source -The [aidb.register_retriever_for_table](../reference/retrievers#aidbregister_retriever_for_table) function is used to create a retriever for a table data source. This is the function signature, you can see many of those are optional and have defaults. +The [aidb.create_retriever_for_table](../reference/retrievers#aidbcreate_retriever_for_table) function is used to create a retriever for a table data source. This is the function signature, you can see many of those are optional and have defaults. ``` -register_retriever_for_table( +create_retriever_for_table( ------------------------------------------------------------------------------- - p_name TEXT, - p_model_name, TEXT, - p_source_table_name regclass, - p_source_table_data_column TEXT, - p_source_table_data_column_type aidb.RetrieverSourceDataFormat, - p_source_table_key_column TEXT DEFAULT 'id', - p_vector_table_name TEXT DEFAULT NULL, - p_vector_table_vector_column TEXT DEFAULT 'embeddings', - p_vector_table_key_column TEXT DEFAULT 'id', - p_topk INTEGER DEFAULT 1, - p_distance_operator aidb.distanceoperator DEFAULT 'L2', - p_options JSONB DEFAULT '{}'::JSONB + name TEXT, + model_name, TEXT, + source_table_name regclass, + source_data_column TEXT, + source_data_type aidb.RetrieverSourceDataFormat, + source_key_column TEXT DEFAULT 'id', + vector_table TEXT DEFAULT NULL, + vector_data_column TEXT DEFAULT 'embeddings', + vector_key_column TEXT DEFAULT 'id', + topk INTEGER DEFAULT 1, + distance_operator aidb.distanceoperator DEFAULT 'L2', + options JSONB DEFAULT '{}'::JSONB ) ``` ### Example: Registering a retriever ``` sql -SELECT aidb.register_retriever_for_table( - p_name => 'test_retriever', - p_model_name => 'simple_model', - p_source_table_name => 'test_source_table', - p_source_table_data_column => 'content', - p_source_table_data_column_type => 'Text' +SELECT aidb.create_retriever_for_table( + name => 'test_retriever', + model_name => 'simple_model', + source_table_name => 'test_source_table', + source_data_column => 'content', + source_data_type => 'Text' ); ``` @@ -55,7 +55,7 @@ If you are using external data sources, you need to create a volume and register Before we can register a retriever for a volume, we need to create a volume. The [aidb.create_volume](../reference/retrievers#aidbcreate_volume) function is used to create a volume. This is the function signature, you can see many of those are optional and have defaults. -``` +```text aidb.create_volume( ------------------------------------------------------------------------------- name TEXT, @@ -82,30 +82,30 @@ The `server_name` comes from calling PGFS functions to create a storage location ### Registering a retriever for a volume -The [aidb.register_retriever_for_volume](../reference/retrievers#aidbregister_retriever_for_volume) function is used to create a retriever for a volume data source. This is the function signature, you can see many of those are optional and have defaults. +The [aidb.create_retriever_for_volume](../reference/retrievers#aidbcreate_retriever_for_volume) function is used to create a retriever for a volume data source. This is the function signature, you can see many of those are optional and have defaults. ``` -aidb.register_retriever_for_volume( +aidb.create_retriever_for_volume( ------------------------------------------------------------------------------- - p_name TEXT, - p_source_volume_name TEXT, - p_vector_table_name TEXT DEFAULT NULL, - p_vector_table_vector_column TEXT DEFAULT 'embeddings', - p_vector_table_key_column TEXT DEFAULT 'id', - p_model_name, TEXT, - p_topk INTEGER DEFAULT 1, - p_distance_operator aidb.distanceoperator DEFAULT 'L2', - p_options JSONB DEFAULT '{}'::JSONB + name TEXT, + model_name, TEXT, + source_volume_name TEXT, + vector_table TEXT DEFAULT NULL, + vector_data_column TEXT DEFAULT 'embeddings', + vector_key_column TEXT DEFAULT 'id', + topk INTEGER DEFAULT 1, + distance_operator aidb.distanceoperator DEFAULT 'L2', + options JSONB DEFAULT '{}'::JSONB ) ``` ### Example: Registering a retriever for a volume ``` sql -SELECT aidb.register_retriever_for_volume( - p_name => 'test_retriever_volume', - p_source_volume_name => 'test_volume', - p_model_name => 'simple_model' +SELECT aidb.create_retriever_for_volume( + name => 'test_retriever_volume', + model_name => 'simple_model', + source_volume_name => 'test_volume' ); ```