Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cosmos] Adds full text policy and full text indexes #37891

Merged
merged 8 commits into from
Nov 15, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 11 additions & 10 deletions sdk/cosmos/azure-cosmos/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
## Release History

### 4.8.1 (Unreleased)
### 4.9.0 (Unreleased)

#### Features Added
* Added full text policy and full text indexing policy. See [PR 37891](https://github.com/Azure/azure-sdk-for-python/pull/37891).

#### Breaking Changes

Expand All @@ -16,12 +17,12 @@ This version and all future versions will support Python 3.13.
#### Features Added
* Added response headers directly to SDK item point operation responses. See [PR 35791](https://github.com/Azure/azure-sdk-for-python/pull/35791).
* SDK will now retry all ServiceRequestErrors (failing outgoing requests) before failing. Default number of retries is 3. See [PR 36514](https://github.com/Azure/azure-sdk-for-python/pull/36514).
* Added Retry Policy for Container Recreate in the Python SDK. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043)
* Added option to disable write payload on writes. See [PR 37365](https://github.com/Azure/azure-sdk-for-python/pull/37365)
* Added get feed ranges API. See [PR 37687](https://github.com/Azure/azure-sdk-for-python/pull/37687)
* Added feed range support in `query_items_change_feed`. See [PR 37687](https://github.com/Azure/azure-sdk-for-python/pull/37687)
* Added **provisional** helper APIs for managing session tokens. See [PR 36971](https://github.com/Azure/azure-sdk-for-python/pull/36971)
* Added ability to get feed range for a partition key. See [PR 36971](https://github.com/Azure/azure-sdk-for-python/pull/36971)
* Added Retry Policy for Container Recreate in the Python SDK. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043).
* Added option to disable write payload on writes. See [PR 37365](https://github.com/Azure/azure-sdk-for-python/pull/37365).
* Added get feed ranges API. See [PR 37687](https://github.com/Azure/azure-sdk-for-python/pull/37687).
* Added feed range support in `query_items_change_feed`. See [PR 37687](https://github.com/Azure/azure-sdk-for-python/pull/37687).
* Added **provisional** helper APIs for managing session tokens. See [PR 36971](https://github.com/Azure/azure-sdk-for-python/pull/36971).
* Added ability to get feed range for a partition key. See [PR 36971](https://github.com/Azure/azure-sdk-for-python/pull/36971).

#### Breaking Changes
* Item-level point operations will now return `CosmosDict` and `CosmosList` response types.
Expand All @@ -34,12 +35,12 @@ For more information on this, see our README section [here](https://github.com/A
* Added retry handling logic for DatabaseAccountNotFound exceptions. See [PR 36514](https://github.com/Azure/azure-sdk-for-python/pull/36514).
* Fixed SDK regex validation that would not allow for item ids to be longer than 255 characters. See [PR 36569](https://github.com/Azure/azure-sdk-for-python/pull/36569).
* Fixed issue where 'NoneType' object has no attribute error was raised when a session retry happened during a query. See [PR 37578](https://github.com/Azure/azure-sdk-for-python/pull/37578).
* Fixed issue where passing subpartition partition key values as a tuple in a query would raise an error. See [PR 38136](https://github.com/Azure/azure-sdk-for-python/pull/38136)
* Fixed issue where passing subpartition partition key values as a tuple in a query would raise an error. See [PR 38136](https://github.com/Azure/azure-sdk-for-python/pull/38136).
* Batch requests will now be properly considered as Write operation. See [PR 38365](https://github.com/Azure/azure-sdk-for-python/pull/38365).

#### Other Changes
* Getting offer thoughput when it has not been defined in a container will now give a 404/10004 instead of just a 404. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043)
* Incomplete Partition Key Extractions in documents for Subpartitioning now gives 400/1001 instead of just a 400. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043)
* Getting offer thoughput when it has not been defined in a container will now give a 404/10004 instead of just a 404. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043).
* Incomplete Partition Key Extractions in documents for Subpartitioning now gives 400/1001 instead of just a 400. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043).
* SDK will now make database account calls every 5 minutes to refresh location cache. See [PR 36514](https://github.com/Azure/azure-sdk-for-python/pull/36514).

### 4.7.0 (2024-05-15)
Expand Down
46 changes: 46 additions & 0 deletions sdk/cosmos/azure-cosmos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -750,6 +750,52 @@ not being able to recognize the new NonStreamingOrderBy capability that makes ve
If this happens, you can set the `AZURE_COSMOS_DISABLE_NON_STREAMING_ORDER_BY` environment variable to `"True"` to opt out of this
functionality and continue operating as usual.*

### Public Preview - Full Text Policy and Full Text Indexes
We have added new capabilities to utilize full text policies and full text indexing for users to leverage full text search
utilizing our Cosmos SDK. These two container-level configurations have to be turned on at the account-level
FabianMeiswinkel marked this conversation as resolved.
Show resolved Hide resolved
before you can use them.

A full text policy allows the user to define the default language to be used for all full text paths, or to set
a language for each path individually in case the user would like to use full text search on data containing different
languages in different fields.

A sample full text policy would look like this:
simorenoh marked this conversation as resolved.
Show resolved Hide resolved
```python
full_text_policy = {
"defaultLanguage": "en-US",
"fullTextPaths": [
{
"path": "/text1",
"language": "en-US"
},
{
"path": "/text2",
"language": "en-US"
}
]
}
```
Currently, the only supported language is `en-US`.
simorenoh marked this conversation as resolved.
Show resolved Hide resolved

Full text search indexes have been added to the already existing indexing_policy and only require the path to the
relevant field to be used.
A sample indexing policy with full text search indexes would look like this:
```python
indexing_policy = {
"automatic": True,
"indexingMode": "consistent",
"compositeIndexes": [
[
{"path": "/numberField", "order": "ascending"},
{"path": "/stringField", "order": "descending"}
]
],
"fullTextIndexes": [
{"path": "/abstract"}
]
}
```

## Troubleshooting

### General
Expand Down
2 changes: 1 addition & 1 deletion sdk/cosmos/azure-cosmos/azure/cosmos/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

VERSION = "4.8.1"
VERSION = "4.9.0"
16 changes: 16 additions & 0 deletions sdk/cosmos/azure-cosmos/azure/cosmos/aio/_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ async def create_container(
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
vector_embedding_policy: Optional[Dict[str, Any]] = None,
full_text_policy: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> ContainerProxy:
"""Create a new container with the given ID (name).
Expand Down Expand Up @@ -206,6 +207,9 @@ async def create_container(
:keyword Dict[str, Any] vector_embedding_policy: The vector embedding policy for the container. Each vector
embedding possesses a predetermined number of dimensions, is associated with an underlying data type, and
is generated for a particular distance function.
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Used to denote the default language to be used for all full text indexes, or to individually
assign a language to each full text index path.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container creation failed.
:returns: A `ContainerProxy` instance representing the new container.
:rtype: ~azure.cosmos.aio.ContainerProxy
Expand Down Expand Up @@ -251,6 +255,8 @@ async def create_container(
definition["computedProperties"] = computed_properties
if vector_embedding_policy is not None:
definition["vectorEmbeddingPolicy"] = vector_embedding_policy
if full_text_policy is not None:
definition["fullTextPolicy"] = full_text_policy

if session_token is not None:
kwargs['session_token'] = session_token
Expand Down Expand Up @@ -285,6 +291,7 @@ async def create_container_if_not_exists(
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
vector_embedding_policy: Optional[Dict[str, Any]] = None,
full_text_policy: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> ContainerProxy:
"""Create a container if it does not exist already.
Expand Down Expand Up @@ -320,6 +327,9 @@ async def create_container_if_not_exists(
:keyword Dict[str, Any] vector_embedding_policy: **provisional** The vector embedding policy for the container.
Each vector embedding possesses a predetermined number of dimensions, is associated with an underlying
data type, and is generated for a particular distance function.
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Used to denote the default language to be used for all full text indexes, or to individually
assign a language to each full text index path.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container creation failed.
:returns: A `ContainerProxy` instance representing the new container.
:rtype: ~azure.cosmos.aio.ContainerProxy
Expand Down Expand Up @@ -349,6 +359,7 @@ async def create_container_if_not_exists(
session_token=session_token,
initial_headers=initial_headers,
vector_embedding_policy=vector_embedding_policy,
full_text_policy=full_text_policy,
**kwargs
)

Expand Down Expand Up @@ -482,6 +493,7 @@ async def replace_container(
etag: Optional[str] = None,
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
full_text_policy: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> ContainerProxy:
"""Reset the properties of the container.
Expand Down Expand Up @@ -509,6 +521,9 @@ async def replace_container(
note that analytical storage can only be enabled on Synapse Link enabled accounts.
:keyword response_hook: A callable invoked with the response metadata.
:paramtype response_hook: Callable[[Dict[str, str], Dict[str, Any]], None]
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
simorenoh marked this conversation as resolved.
Show resolved Hide resolved
Used to denote the default language to be used for all full text indexes, or to individually
assign a language to each full text index path.
:returns: A `ContainerProxy` instance representing the container after replace completed.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: Raised if the container couldn't be replaced.
This includes if the container with given id does not exist.
Expand Down Expand Up @@ -545,6 +560,7 @@ async def replace_container(
"defaultTtl": default_ttl,
"conflictResolutionPolicy": conflict_resolution_policy,
"analyticalStorageTtl": analytical_storage_ttl,
"fullTextPolicy": full_text_policy
}.items()
if value is not None
}
Expand Down
16 changes: 16 additions & 0 deletions sdk/cosmos/azure-cosmos/azure/cosmos/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ def create_container( # pylint:disable=docstring-missing-param
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
vector_embedding_policy: Optional[Dict[str, Any]] = None,
full_text_policy: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> ContainerProxy:
"""Create a new container with the given ID (name).
Expand Down Expand Up @@ -203,6 +204,9 @@ def create_container( # pylint:disable=docstring-missing-param
:keyword Dict[str, Any] vector_embedding_policy: **provisional** The vector embedding policy for the container.
Each vector embedding possesses a predetermined number of dimensions, is associated with an underlying
data type, and is generated for a particular distance function.
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Used to denote the default language to be used for all full text indexes, or to individually
assign a language to each full text index path.
:returns: A `ContainerProxy` instance representing the new container.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container creation failed.
:rtype: ~azure.cosmos.ContainerProxy
Expand Down Expand Up @@ -246,6 +250,8 @@ def create_container( # pylint:disable=docstring-missing-param
definition["computedProperties"] = computed_properties
if vector_embedding_policy is not None:
definition["vectorEmbeddingPolicy"] = vector_embedding_policy
if full_text_policy is not None:
definition["fullTextPolicy"] = full_text_policy

if session_token is not None:
kwargs['session_token'] = session_token
Expand Down Expand Up @@ -287,6 +293,7 @@ def create_container_if_not_exists( # pylint:disable=docstring-missing-param
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
vector_embedding_policy: Optional[Dict[str, Any]] = None,
full_text_policy: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> ContainerProxy:
"""Create a container if it does not exist already.
Expand Down Expand Up @@ -318,6 +325,9 @@ def create_container_if_not_exists( # pylint:disable=docstring-missing-param
:keyword Dict[str, Any] vector_embedding_policy: The vector embedding policy for the container. Each vector
embedding possesses a predetermined number of dimensions, is associated with an underlying data type, and
is generated for a particular distance function.
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Used to denote the default language to be used for all full text indexes, or to individually
assign a language to each full text index path.
:returns: A `ContainerProxy` instance representing the container.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container read or creation failed.
:rtype: ~azure.cosmos.ContainerProxy
Expand Down Expand Up @@ -349,6 +359,7 @@ def create_container_if_not_exists( # pylint:disable=docstring-missing-param
session_token=session_token,
initial_headers=initial_headers,
vector_embedding_policy=vector_embedding_policy,
full_text_policy=full_text_policy,
**kwargs
)

Expand Down Expand Up @@ -538,6 +549,7 @@ def replace_container( # pylint:disable=docstring-missing-param
etag: Optional[str] = None,
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
full_text_policy: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> ContainerProxy:
"""Reset the properties of the container.
Expand All @@ -562,6 +574,9 @@ def replace_container( # pylint:disable=docstring-missing-param
None leaves analytical storage off and a value of -1 turns analytical storage on with no TTL. Please
note that analytical storage can only be enabled on Synapse Link enabled accounts.
:keyword Callable response_hook: A callable invoked with the response metadata.
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Used to denote the default language to be used for all full text indexes, or to individually
assign a language to each full text index path.
:returns: A `ContainerProxy` instance representing the container after replace completed.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: Raised if the container couldn't be replaced.
This includes if the container with given id does not exist.
Expand Down Expand Up @@ -603,6 +618,7 @@ def replace_container( # pylint:disable=docstring-missing-param
"defaultTtl": default_ttl,
"conflictResolutionPolicy": conflict_resolution_policy,
"analyticalStorageTtl": analytical_storage_ttl,
"fullTextPolicy": full_text_policy,
}.items()
if value is not None
}
Expand Down
60 changes: 59 additions & 1 deletion sdk/cosmos/azure-cosmos/samples/index_management.py
Original file line number Diff line number Diff line change
Expand Up @@ -637,7 +637,7 @@ def use_geospatial_indexing_policy(db):
try:
delete_container_if_exists(db, CONTAINER_ID)

# Create a container with vector embedding policy and vector indexes
# Create a container with geospatial indexes
indexing_policy = {
'includedPaths': [
{'path': '/"Location"/?',
Expand Down Expand Up @@ -746,6 +746,61 @@ def get_embeddings(num):
print("Entity doesn't exist")


def use_full_text_policy(db):
try:
delete_container_if_exists(db, CONTAINER_ID)
FabianMeiswinkel marked this conversation as resolved.
Show resolved Hide resolved

# Create a container with full text policy and full text indexes
indexing_policy = {
"automatic": True,
"fullTextIndexes": [
{"path": "/text1"}
]
}
full_text_policy = {
"defaultLanguage": "en-US",
"fullTextPaths": [
{
"path": "/text1",
"language": "en-US"
},
{
"path": "/text2",
"language": "en-US"
}
]
}

created_container = db.create_container(
id=CONTAINER_ID,
partition_key=PARTITION_KEY,
indexing_policy=indexing_policy,
full_text_policy=full_text_policy
)
properties = created_container.read()
print(created_container)

print("\n" + "-" * 25 + "\n11. Container created with full text policy and full text indexes")
print_dictionary_items(properties["indexingPolicy"])
print_dictionary_items(properties["fullTextPolicy"])

# Create some items to use with full text search
for i in range(10):
created_container.create_item({"id": "full_text_item" + str(i), "text1": "some-text"})

# Run full text search queries using ranking
query = "select * from c"
query_documents_with_custom_query(created_container, query)

# Cleanup
db.delete_container(created_container)
print("\n")
except exceptions.CosmosResourceExistsError:
print("Entity already exists")
except exceptions.CosmosResourceNotFoundError:
print("Entity doesn't exist")


def run_sample():
try:
client = obtain_client()
Expand Down Expand Up @@ -782,6 +837,9 @@ def run_sample():
# 10. Create and use a vector embedding policy
use_vector_embedding_policy(created_db)

# 11. Create and use a full text policy
use_full_text_policy(created_db)

except exceptions.AzureError as e:
raise e

Expand Down
Loading