Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Blocks API and add additional Get document descriptions. #7836

Merged
merged 10 commits into from
Aug 8, 2024
98 changes: 87 additions & 11 deletions _api-reference/document-apis/get-documents.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,29 +11,28 @@
**Introduced 1.0**
{: .label .label-purple }

After adding a JSON document to your index, you can use the get document API operation to retrieve the document's information and data.
After adding a JSON document to your index, you can use the Get Document API operation to retrieve the document's information and data.

## Example

```json
GET sample-index1/_doc/1
```
{% include copy-curl.html %}

## Path and HTTP methods

```
Use the GET method to retrieve a document and its source or stored fields from a particular index. Use the HEAD method to verify that a document exists:

```json
GET <index>/_doc/<_id>
HEAD <index>/_doc/<_id>
```
```

Use `_source` to retrieve the document source or to verify that it exists:

```json
GET <index>/_source/<_id>
HEAD <index>/_source/<_id>
```

## URL parameters
## Query parameters

All get document URL parameters are optional.
All query parameters are optional.

Parameter | Type | Description
:--- | :--- | :---
Expand All @@ -48,6 +47,83 @@
version | Integer | The version of the document to return, which must match the current version of the document.
version_type | Enum | Retrieves a specifically typed document. Available options are `external` (retrieve the document if the specified version number is greater than the document's current version) and `external_gte` (retrieve the document if the specified version number is greater than or equal to the document's current version). For example, to retrieve version 3 of a document, use `/_doc/1?version=3&version_type=external`.

### Real time

The OpenSearch Get Document API operates in real time by default, which means that it retrieves the latest version of the document regardless of the index's refresh rate or the rate at which new data becomes searchable. However, if you request stored fields (using the `stored_fields` parameter) for a document that has been updated but not yet refreshed, then the Get Document API parses and analyzes the document's source to extract those stored fields.

To disable the real-time behavior and retrieve the document based on the last refreshed state of the index, set the `realtime` parameter to `false`.

### Source filtering

By default, the Get Document API returns the entire contents of the `_source` field for the requested document. However, you can choose to exclude the `_source` field from the response by using the `_source` URL parameter and setting it to `false`, as shown in the following example:

```json
GET test-index/_doc/0?_source=false
```

#### `source` includes and excludes

Check failure on line 64 in _api-reference/document-apis/get-documents.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _api-reference/document-apis/get-documents.md#L64

[OpenSearch.HeadingCapitalization] 'includes and excludes' is a heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.HeadingCapitalization] 'includes and excludes' is a heading and should be in sentence case.", "location": {"path": "_api-reference/document-apis/get-documents.md", "range": {"start": {"line": 64, "column": 15}}}, "severity": "ERROR"}

If you only want to retrieve specific fields from the source, use the `_source_includes` or `_source_excludes` parameters to include or exclude particular fields, respectively. This can be beneficial for large documents because retrieving only the required fields can reduce network overhead.

Check failure on line 66 in _api-reference/document-apis/get-documents.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _api-reference/document-apis/get-documents.md#L66

[OpenSearch.Spelling] Error: _source. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: _source. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_api-reference/document-apis/get-documents.md", "range": {"start": {"line": 66, "column": 95}}}, "severity": "ERROR"}

Both parameters accept a comma-separated list of fields and wildcard expressions, as shown in the following example, where any `_source` that contains `*.play` is included in the response but sources with the field `entities` are excluded:

```json
GET test-index/_doc/0?_source_includes=*.play&_source_excludes=entities
```

#### Shorter notation

If you only want to include certain fields and don't need to exclude any, you can use a shorter notation by specifying the desired fields directly in the `_source` parameter:

```json
GET test-index/_doc/0?_source=*.id
```

### Routing

When indexing documents in OpenSearch, you can specify a `routing` value to control the shard assignments for documents. If routing was used during indexing, you must provide the same routing value when retrieving the document using the Get Document API, as shown in the following example:

```json
GET test-index/_doc/1?routing=user1
```

This request retrieves the document with the ID `1`, but it uses the routing value "user1" to determine on which shard the document is stored. If the correct routing value is not specified, the Get Document API is not able to locate and fetch the requested document.

### Preference

The Get Document API allows you to control which shard replica handles the request. By default, the operation is randomly distributed across the available shard replicas.

However, you can specify a preference to influence the replica selection. The preference can be set to one of the following values:

- `_local`: The operation attempts to execute on a locally allocated shard replica, if possible. This can improve performance by reducing network overhead.
- Custom (string) value: Specifying a custom string value ensures that requests with the same value are routed to the same set of shards. This consistency can be beneficial when managing shards in different refresh states because it prevents "jumping values" that may occur when hitting shards with varying data visibility. A common practice is to use a web session ID or a user name as the custom value.


### Refresh

Set the `refresh` parameter to `true` to force a refresh of the relevant shard before running the Get Document API operation. This ensures that the most recent data changes are made searchable and visible to the API. However, a refresh should be performed judiciously because it can potentially impose a heavy load on the system and slow down indexing performance. It's recommended to carefully evaluate the trade-off between data freshness and system load before enabling the `refresh` parameter.

### Distributed

When running the Get Document API, OpenSearch first calculates a hash value based on the document ID, which determines the specific ID of the shard on which the document resides. The operation is then redirected to one of the replicas (including the primary shard and its replica shards) in that shard ID group, and the result is returned from that replica.

A higher number of shard replicas improves the scalability and performance of GET operations because the load can be distributed across multiple replica shards. This means that as the number of replicas increases, you can achieve better scaling and throughput for Get Document API requests.

### Versioning support

Use the `version` parameter to retrieve a document only if its current version matches the specified version number. This can be useful for ensuring data consistency and preventing conflicts when working with versioned documents.

Internally, when a document is updated in OpenSearch, the original version is marked as deleted, and a new version of the document is added. However, the original version doesn't immediately disappear from the system. While you won't be able to access it through the Get Document API, OpenSearch manages the cleanup of deleted document versions in the background as you continue indexing new data.

## Example request

The following example request retrieves information about a document named `1`:

```json
GET sample-index1/_doc/1
```
{% include copy-curl.html %}


## Example response
```json
Expand Down
59 changes: 59 additions & 0 deletions _api-reference/index-apis/blocks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
layout: default
title: Blocks
parent: Index APIs
nav_order: 6
---

# Blocks
**Introduced 1.0**
{: .label .label-purple }

Use the Blocks API to limit certain operations on a specified index. Different types of blocks allow you to restrict index write, read, or metadata operations.
For example, adding a `write` block through the API ensures that all index shards have properly accounted for the block before returning a successful response. Any in-flight write operations to the index must be complete before the `write` block takes effect.

## Path and HTTP methods

```json
PUT /<index>/_block/<block>
```

## Path parameters

| Parameter | Data type | Description |
:--- | :--- | :---
| `index` | String | A comma-delimited list of index names. Wildcard expressions (`*`) are supported. To target all data streams and indexes in a cluster, use `_all` or `*`. Optional. |
| `<block>` | String | Specifies the type of block to apply to the index. Valid values are: <br> `metadata`: Disables all metadata changes, such as closing the index. <br> `read`: Disables any read operations. <br> `read_only`: Disables any write operations and metadata changes. <br> `write`: Disables write operations. However, metadata changes are still allowed. |

## Query parameters

The following table lists the available query parameters. All query parameters are optional.

| Parameter | Data type | Description |
| :--- | :--- | :--- |
| `ignore_unavailable` | Boolean | When `false`, the request returns an error when it targets a missing or closed index. Default is `false`.
| `allow_no_indices` | Boolean | When `false`, the Refresh Index API returns an error when a wildcard expression, index alias, or `_all` targets only closed or missing indexes, even when the request is made against open indexes. Default is `true`. |
| `expand_wildcards` | String | The type of index that the wildcard patterns can match. If the request targets data streams, this argument determines whether the wildcard expressions match any hidden data streams. Supports comma-separated values, such as `open,hidden`. Valid values are `all`, `open`, `closed`, `hidden`, and `none`. |
`cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`.
`timeout` | Time | The amount of time to wait for the request to return. Default is `30s`. |

## Example request

The following example request disables any `write` operations made to the test index:

```json
PUT /test-index/_block/write
```

## Example response

```json
{
"acknowledged" : true,
"shards_acknowledged" : true,
"indices" : [ {
"name" : "test-index",
"blocked" : true
} ]
}
```
Loading