Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cosmos] Adds full text policy and full text indexes #37891

Merged
merged 8 commits into from
Nov 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 11 additions & 10 deletions sdk/cosmos/azure-cosmos/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
## Release History

### 4.8.1 (Unreleased)
### 4.9.0 (Unreleased)

#### Features Added
* Added full text policy and full text indexing policy. See [PR 37891](https://github.com/Azure/azure-sdk-for-python/pull/37891).

#### Breaking Changes

Expand All @@ -16,12 +17,12 @@ This version and all future versions will support Python 3.13.
#### Features Added
* Added response headers directly to SDK item point operation responses. See [PR 35791](https://github.com/Azure/azure-sdk-for-python/pull/35791).
* SDK will now retry all ServiceRequestErrors (failing outgoing requests) before failing. Default number of retries is 3. See [PR 36514](https://github.com/Azure/azure-sdk-for-python/pull/36514).
* Added Retry Policy for Container Recreate in the Python SDK. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043)
* Added option to disable write payload on writes. See [PR 37365](https://github.com/Azure/azure-sdk-for-python/pull/37365)
* Added get feed ranges API. See [PR 37687](https://github.com/Azure/azure-sdk-for-python/pull/37687)
* Added feed range support in `query_items_change_feed`. See [PR 37687](https://github.com/Azure/azure-sdk-for-python/pull/37687)
* Added **provisional** helper APIs for managing session tokens. See [PR 36971](https://github.com/Azure/azure-sdk-for-python/pull/36971)
* Added ability to get feed range for a partition key. See [PR 36971](https://github.com/Azure/azure-sdk-for-python/pull/36971)
* Added Retry Policy for Container Recreate in the Python SDK. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043).
* Added option to disable write payload on writes. See [PR 37365](https://github.com/Azure/azure-sdk-for-python/pull/37365).
* Added get feed ranges API. See [PR 37687](https://github.com/Azure/azure-sdk-for-python/pull/37687).
* Added feed range support in `query_items_change_feed`. See [PR 37687](https://github.com/Azure/azure-sdk-for-python/pull/37687).
* Added **provisional** helper APIs for managing session tokens. See [PR 36971](https://github.com/Azure/azure-sdk-for-python/pull/36971).
* Added ability to get feed range for a partition key. See [PR 36971](https://github.com/Azure/azure-sdk-for-python/pull/36971).

#### Breaking Changes
* Item-level point operations will now return `CosmosDict` and `CosmosList` response types.
Expand All @@ -34,12 +35,12 @@ For more information on this, see our README section [here](https://github.com/A
* Added retry handling logic for DatabaseAccountNotFound exceptions. See [PR 36514](https://github.com/Azure/azure-sdk-for-python/pull/36514).
* Fixed SDK regex validation that would not allow for item ids to be longer than 255 characters. See [PR 36569](https://github.com/Azure/azure-sdk-for-python/pull/36569).
* Fixed issue where 'NoneType' object has no attribute error was raised when a session retry happened during a query. See [PR 37578](https://github.com/Azure/azure-sdk-for-python/pull/37578).
* Fixed issue where passing subpartition partition key values as a tuple in a query would raise an error. See [PR 38136](https://github.com/Azure/azure-sdk-for-python/pull/38136)
* Fixed issue where passing subpartition partition key values as a tuple in a query would raise an error. See [PR 38136](https://github.com/Azure/azure-sdk-for-python/pull/38136).
* Batch requests will now be properly considered as Write operation. See [PR 38365](https://github.com/Azure/azure-sdk-for-python/pull/38365).

#### Other Changes
* Getting offer thoughput when it has not been defined in a container will now give a 404/10004 instead of just a 404. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043)
* Incomplete Partition Key Extractions in documents for Subpartitioning now gives 400/1001 instead of just a 400. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043)
* Getting offer thoughput when it has not been defined in a container will now give a 404/10004 instead of just a 404. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043).
* Incomplete Partition Key Extractions in documents for Subpartitioning now gives 400/1001 instead of just a 400. See [PR 36043](https://github.com/Azure/azure-sdk-for-python/pull/36043).
* SDK will now make database account calls every 5 minutes to refresh location cache. See [PR 36514](https://github.com/Azure/azure-sdk-for-python/pull/36514).

### 4.7.0 (2024-05-15)
Expand Down
52 changes: 52 additions & 0 deletions sdk/cosmos/azure-cosmos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -750,6 +750,56 @@ not being able to recognize the new NonStreamingOrderBy capability that makes ve
If this happens, you can set the `AZURE_COSMOS_DISABLE_NON_STREAMING_ORDER_BY` environment variable to `"True"` to opt out of this
functionality and continue operating as usual.*

### Public Preview - Full Text Policy and Full Text Indexes
We have added new capabilities to utilize full text policies and full text indexing for users to leverage full text search
utilizing our Cosmos SDK. These two container-level configurations have to be turned on at the account-level
FabianMeiswinkel marked this conversation as resolved.
Show resolved Hide resolved
before you can use them.

A full text policy allows the user to define the default language to be used for all full text paths, or to set
a language for each path individually in case the user would like to use full text search on data containing different
languages in different fields.

A sample full text policy would look like this:
simorenoh marked this conversation as resolved.
Show resolved Hide resolved
```python
full_text_policy = {
"defaultLanguage": "en-US",
"fullTextPaths": [
{
"path": "/text1",
"language": "en-US"
},
{
"path": "/text2",
"language": "en-US"
}
]
}
```
Currently, the only supported language is `en-US` - using the relevant ISO-639 language code to ISO-3166 country code.
Any non-supported language or code will return an exception when trying to use it - which will also include the list of supported languages.
This list will include more options in the future; for more information on supported languages, please see [here][cosmos_fts].

Full text search indexes have been added to the already existing indexing_policy and only require the path to the
relevant field to be used.
A sample indexing policy with full text search indexes would look like this:
```python
indexing_policy = {
"automatic": True,
"indexingMode": "consistent",
"compositeIndexes": [
[
{"path": "/numberField", "order": "ascending"},
{"path": "/stringField", "order": "descending"}
]
],
"fullTextIndexes": [
{"path": "/abstract"}
]
}
```
Modifying the index in a container is an asynchronous operation that can take a long time to finish. See [here][cosmos_index_policy_change] for more information.
For more information on using full text policies and full text indexes, see [here][cosmos_fts].

## Troubleshooting

### General
Expand Down Expand Up @@ -886,6 +936,8 @@ For more extensive documentation on the Cosmos DB service, see the [Azure Cosmos
[cosmos_concurrency_sample]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/cosmos/azure-cosmos/samples/concurrency_sample.py
[cosmos_index_sample]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/cosmos/azure-cosmos/samples/index_management.py
[cosmos_index_sample_async]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/cosmos/azure-cosmos/samples/index_management_async.py
[cosmos_fts]: https://aka.ms/cosmosfulltextsearch
[cosmos_index_policy_change]: https://learn.microsoft.com/azure/cosmos-db/index-policy#modifying-the-indexing-policy

## Contributing

Expand Down
2 changes: 1 addition & 1 deletion sdk/cosmos/azure-cosmos/azure/cosmos/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

VERSION = "4.8.1"
VERSION = "4.9.0"
16 changes: 16 additions & 0 deletions sdk/cosmos/azure-cosmos/azure/cosmos/aio/_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ async def create_container(
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
vector_embedding_policy: Optional[Dict[str, Any]] = None,
full_text_policy: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> ContainerProxy:
"""Create a new container with the given ID (name).
Expand Down Expand Up @@ -206,6 +207,9 @@ async def create_container(
:keyword Dict[str, Any] vector_embedding_policy: The vector embedding policy for the container. Each vector
embedding possesses a predetermined number of dimensions, is associated with an underlying data type, and
is generated for a particular distance function.
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Used to denote the default language to be used for all full text indexes, or to individually
assign a language to each full text index path.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container creation failed.
:returns: A `ContainerProxy` instance representing the new container.
:rtype: ~azure.cosmos.aio.ContainerProxy
Expand Down Expand Up @@ -251,6 +255,8 @@ async def create_container(
definition["computedProperties"] = computed_properties
if vector_embedding_policy is not None:
definition["vectorEmbeddingPolicy"] = vector_embedding_policy
if full_text_policy is not None:
definition["fullTextPolicy"] = full_text_policy

if session_token is not None:
kwargs['session_token'] = session_token
Expand Down Expand Up @@ -285,6 +291,7 @@ async def create_container_if_not_exists(
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
vector_embedding_policy: Optional[Dict[str, Any]] = None,
full_text_policy: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> ContainerProxy:
"""Create a container if it does not exist already.
Expand Down Expand Up @@ -320,6 +327,9 @@ async def create_container_if_not_exists(
:keyword Dict[str, Any] vector_embedding_policy: **provisional** The vector embedding policy for the container.
Each vector embedding possesses a predetermined number of dimensions, is associated with an underlying
data type, and is generated for a particular distance function.
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Used to denote the default language to be used for all full text indexes, or to individually
assign a language to each full text index path.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container creation failed.
:returns: A `ContainerProxy` instance representing the new container.
:rtype: ~azure.cosmos.aio.ContainerProxy
Expand Down Expand Up @@ -349,6 +359,7 @@ async def create_container_if_not_exists(
session_token=session_token,
initial_headers=initial_headers,
vector_embedding_policy=vector_embedding_policy,
full_text_policy=full_text_policy,
**kwargs
)

Expand Down Expand Up @@ -482,6 +493,7 @@ async def replace_container(
etag: Optional[str] = None,
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
full_text_policy: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> ContainerProxy:
"""Reset the properties of the container.
Expand Down Expand Up @@ -509,6 +521,9 @@ async def replace_container(
note that analytical storage can only be enabled on Synapse Link enabled accounts.
:keyword response_hook: A callable invoked with the response metadata.
:paramtype response_hook: Callable[[Dict[str, str], Dict[str, Any]], None]
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
simorenoh marked this conversation as resolved.
Show resolved Hide resolved
Used to denote the default language to be used for all full text indexes, or to individually
assign a language to each full text index path.
:returns: A `ContainerProxy` instance representing the container after replace completed.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: Raised if the container couldn't be replaced.
This includes if the container with given id does not exist.
Expand Down Expand Up @@ -545,6 +560,7 @@ async def replace_container(
"defaultTtl": default_ttl,
"conflictResolutionPolicy": conflict_resolution_policy,
"analyticalStorageTtl": analytical_storage_ttl,
"fullTextPolicy": full_text_policy
}.items()
if value is not None
}
Expand Down
16 changes: 16 additions & 0 deletions sdk/cosmos/azure-cosmos/azure/cosmos/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ def create_container( # pylint:disable=docstring-missing-param
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
vector_embedding_policy: Optional[Dict[str, Any]] = None,
full_text_policy: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> ContainerProxy:
"""Create a new container with the given ID (name).
Expand Down Expand Up @@ -203,6 +204,9 @@ def create_container( # pylint:disable=docstring-missing-param
:keyword Dict[str, Any] vector_embedding_policy: **provisional** The vector embedding policy for the container.
Each vector embedding possesses a predetermined number of dimensions, is associated with an underlying
data type, and is generated for a particular distance function.
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Used to denote the default language to be used for all full text indexes, or to individually
assign a language to each full text index path.
:returns: A `ContainerProxy` instance representing the new container.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container creation failed.
:rtype: ~azure.cosmos.ContainerProxy
Expand Down Expand Up @@ -246,6 +250,8 @@ def create_container( # pylint:disable=docstring-missing-param
definition["computedProperties"] = computed_properties
if vector_embedding_policy is not None:
definition["vectorEmbeddingPolicy"] = vector_embedding_policy
if full_text_policy is not None:
definition["fullTextPolicy"] = full_text_policy

if session_token is not None:
kwargs['session_token'] = session_token
Expand Down Expand Up @@ -287,6 +293,7 @@ def create_container_if_not_exists( # pylint:disable=docstring-missing-param
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
vector_embedding_policy: Optional[Dict[str, Any]] = None,
full_text_policy: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> ContainerProxy:
"""Create a container if it does not exist already.
Expand Down Expand Up @@ -318,6 +325,9 @@ def create_container_if_not_exists( # pylint:disable=docstring-missing-param
:keyword Dict[str, Any] vector_embedding_policy: The vector embedding policy for the container. Each vector
embedding possesses a predetermined number of dimensions, is associated with an underlying data type, and
is generated for a particular distance function.
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Used to denote the default language to be used for all full text indexes, or to individually
assign a language to each full text index path.
:returns: A `ContainerProxy` instance representing the container.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container read or creation failed.
:rtype: ~azure.cosmos.ContainerProxy
Expand Down Expand Up @@ -349,6 +359,7 @@ def create_container_if_not_exists( # pylint:disable=docstring-missing-param
session_token=session_token,
initial_headers=initial_headers,
vector_embedding_policy=vector_embedding_policy,
full_text_policy=full_text_policy,
**kwargs
)

Expand Down Expand Up @@ -538,6 +549,7 @@ def replace_container( # pylint:disable=docstring-missing-param
etag: Optional[str] = None,
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
full_text_policy: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> ContainerProxy:
"""Reset the properties of the container.
Expand All @@ -562,6 +574,9 @@ def replace_container( # pylint:disable=docstring-missing-param
None leaves analytical storage off and a value of -1 turns analytical storage on with no TTL. Please
note that analytical storage can only be enabled on Synapse Link enabled accounts.
:keyword Callable response_hook: A callable invoked with the response metadata.
:keyword Dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Used to denote the default language to be used for all full text indexes, or to individually
assign a language to each full text index path.
:returns: A `ContainerProxy` instance representing the container after replace completed.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: Raised if the container couldn't be replaced.
This includes if the container with given id does not exist.
Expand Down Expand Up @@ -603,6 +618,7 @@ def replace_container( # pylint:disable=docstring-missing-param
"defaultTtl": default_ttl,
"conflictResolutionPolicy": conflict_resolution_policy,
"analyticalStorageTtl": analytical_storage_ttl,
"fullTextPolicy": full_text_policy,
}.items()
if value is not None
}
Expand Down
Loading
Loading