Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Blocks API and add additional Get document descriptions. #7836

Merged
merged 10 commits into from
Aug 8, 2024
88 changes: 80 additions & 8 deletions _api-reference/document-apis/get-documents.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,27 +11,26 @@
**Introduced 1.0**
{: .label .label-purple }

After adding a JSON document to your index, you can use the get document API operation to retrieve the document's information and data.
After adding a JSON document to your index, you can use the Get Document API operation to retrieve the document's information and data.

## Example

```json
GET sample-index1/_doc/1
```
{% include copy-curl.html %}

## Path and HTTP methods

Use the GET method to retrieve a document and its source or stored fields from a particular index. Use the HEAD method to verify that a document exists.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
GET <index>/_doc/<_id>
HEAD <index>/_doc/<_id>
```

Use `_source` to retrieve just the document source or verify that it exists.

Check warning on line 26 in _api-reference/document-apis/get-documents.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _api-reference/document-apis/get-documents.md#L26

[OpenSearch.Simple] Don't use 'just' because it's not neutral in tone. If you mean 'only', use 'only' instead.
Raw output
{"message": "[OpenSearch.Simple] Don't use 'just' because it's not neutral in tone. If you mean 'only', use 'only' instead.", "location": {"path": "_api-reference/document-apis/get-documents.md", "range": {"start": {"line": 26, "column": 27}}}, "severity": "WARNING"}
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
GET <index>/_source/<_id>
HEAD <index>/_source/<_id>
```

## URL parameters
## Query parameters

All get document URL parameters are optional.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

Expand All @@ -48,6 +47,79 @@
version | Integer | The version of the document to return, which must match the current version of the document.
version_type | Enum | Retrieves a specifically typed document. Available options are `external` (retrieve the document if the specified version number is greater than the document's current version) and `external_gte` (retrieve the document if the specified version number is greater than or equal to the document's current version). For example, to retrieve version 3 of a document, use `/_doc/1?version=3&version_type=external`.

### Realtime

Check failure on line 50 in _api-reference/document-apis/get-documents.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _api-reference/document-apis/get-documents.md#L50

[OpenSearch.Spelling] Error: Realtime. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Realtime. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_api-reference/document-apis/get-documents.md", "range": {"start": {"line": 50, "column": 5}}}, "severity": "ERROR"}
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

The Get Document API in OpenSearch operates in realtime mode by default, meaning it retrieves the latest version of the document regardless of the index's refresh rate (which determines when new data becomes searchable). However, if you request stored fields (using the `stored_fields` parameter) for a document that has been updated but not yet refreshed, the Get Document API needs to parse and analyze the document's source to extract those stored fields. To disable this realtime behavior and retrieve the document based on the last refreshed state of the index, set the `realtime` parameter to false.

Check failure on line 52 in _api-reference/document-apis/get-documents.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _api-reference/document-apis/get-documents.md#L52

[OpenSearch.Spelling] Error: realtime. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: realtime. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_api-reference/document-apis/get-documents.md", "range": {"start": {"line": 52, "column": 48}}}, "severity": "ERROR"}

Check failure on line 52 in _api-reference/document-apis/get-documents.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _api-reference/document-apis/get-documents.md#L52

[OpenSearch.Spelling] Error: realtime. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: realtime. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_api-reference/document-apis/get-documents.md", "range": {"start": {"line": 52, "column": 476}}}, "severity": "ERROR"}
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

### Source filtering

By default, the Get Document API returns the entire contents of the `_source` field for the requested document. However, you can choose to exclude the `_source` field from the response by using the `_source` URL parameter and setting it to false, as shown in the following example:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "false" be in code font?

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```json
GET test-index/_doc/0?_source=false
```

#### `source` includes and excludes

Check failure on line 62 in _api-reference/document-apis/get-documents.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _api-reference/document-apis/get-documents.md#L62

[OpenSearch.HeadingCapitalization] 'includes and excludes' is a heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.HeadingCapitalization] 'includes and excludes' is a heading and should be in sentence case.", "location": {"path": "_api-reference/document-apis/get-documents.md", "range": {"start": {"line": 62, "column": 15}}}, "severity": "ERROR"}

If you only need specific fields from the `_source`, use the `_source_includes` or `_source_excludes` parameters to include or exclude particular fields, respectively. This can be beneficial for large documents, as retrieving only the required fields can reduce network overhead. Both parameters accept a comma-separated list of fields or wildcard expressions, as shown in the following example:

Check failure on line 64 in _api-reference/document-apis/get-documents.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _api-reference/document-apis/get-documents.md#L64

[OpenSearch.Spelling] Error: _source. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: _source. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_api-reference/document-apis/get-documents.md", "range": {"start": {"line": 64, "column": 64}}}, "severity": "ERROR"}
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
GET test-index/_doc/0?_source_includes=*.id&_source_excludes=entities
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
```

#### Shorter notation

If only want to include certain fields and don't need to exclude any, you can use a shorter notation by specifying the desired fields directly in the `_source` parameter:
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```json
GET test-index/_doc/0?_source=*.id
```

### Routing

When indexing documents in OpenSearch, specify a `routing` value to control the shard assignment for those documents. If routing was used during indexing, you must provide the same routing value when retrieving the document using the Get Document API, as shown in the following example:
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```json
GET test-index/_doc/1?routing=user1
```

This request retrieves the document with the ID `1`, but it uses the routing value "user1" to determine the shard where the document is stored. If the correct routing value is not specified, the Get Document API will not be able to locate and fetch the requested document.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

### Preference

The Get Document API allows you to control which shard replica should handle the request. By default, the operation is randomly distributed across the available shard replicas.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

However, you can specify a preference to influence the replica selection. The preference can be set to one of the following values:

- `_local`: The operation will try to execute on a locally allocated shard replica, if possible. This can improve performance by reducing network overhead.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
- Custom (string) value: Specifying a custom string value ensures that requests with the same value will be routed to the same set of shards. This consistency can be beneficial when dealing with shards in different refresh states, as it prevents "jumping values" that may occur when hitting shards with varying data visibility. A common practice is to use a web session ID or a user name as the custom value.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved


### Refresh

Set the `refresh` parameter to `true` to force a refresh of the relevant shard before running the Get Document API operation. This ensures that the latest data changes are made searchable and visible to the API. However, triggering a refresh should be done judiciously, as it can potentially impose a heavy load on the system and slow down indexing performance. It's recommended to carefully evaluate the trade-off between data freshness and system load before enabling the `refresh` parameter.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

### Distributed

When running the Get Document API, OpenSearch first calculates a hash value based on the document ID, which determines the specific shard ID where the document resides. The operation is then redirected to one of the replicas (including the primary shard and its replica shards) within that shard ID group, and the result is returned from that replica.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

Having more replicas for a shard improves the scalability and performance of GET operations, as the load can be distributed across multiple replica shards. This means that the more replicas you have, the better scaling and throughput you can achieve for Get Document API requests.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

### Versioning support

Use the `version` parameter to retrieve a document only if its current version matches the specified version number. This can be useful for ensuring data consistency and preventing conflicts when working with versioned documents.

Internally, when a document is updated in OpenSearch, the old version is marked as deleted, and a new version of the document is added. However, the old version doesn't immediately disappear from the system. While you won't be able to access the old version through the Get Document API, OpenSearch handles the cleanup of deleted document versions in the background as you continue indexing new data.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## Example request

The following example request information about a document named `1`:
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```json
GET sample-index1/_doc/1
```
{% include copy-curl.html %}


## Example response
```json
Expand Down
58 changes: 58 additions & 0 deletions _api-reference/index-apis/blocks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
layout: default
title: Blocks
parent: Index APIs
nav_order: 6
---

# Blocks
**Introduced 1.0**
{: .label .label-purple }

The Blocks API limits which operations are available on the specified index. OpenSearch provides different types of blocks that allow you to restrict write, read, or metadata operations on an index. When adding a write block through the API, it ensures that all shards of the index have properly accounted for the block before returning a successful response to the user. This means that any in-flight write operations to the index have been completed before the write block takes effect.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## Path and HTTP methods

```json
PUT /<index>/_block/<block>
```

## Path parameters

| Parameter | Type | Description |
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
:--- | :--- | :---
| `index` | String | A comma-delimited list of index names. Wildcard expressions (`*`) are supported. To target all data streams and indexes in a cluster, use `_all` or `*`. Optional. |
| `<block>` | String | Specifies the type of block to apply to the index. Valid values are: <br> `metadata`: Disables all metadata changes, such as closing the index. <br> `read`: Disables any read operations. <br> `read_only`: Disables any write operations and metadata changes. <br> `write`: Disables write operations. However, metadata changes are still allowed. |

Check failure on line 25 in _api-reference/index-apis/blocks.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _api-reference/index-apis/blocks.md#L25

[OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'index. Valid'.
Raw output
{"message": "[OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'index. Valid'.", "location": {"path": "_api-reference/index-apis/blocks.md", "range": {"start": {"line": 25, "column": 68}}}, "severity": "ERROR"}
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## Query parameters

The following table lists the available query parameters. All query parameters are optional.

| Parameter | Data type | Description |
| :--- | :--- | :--- |
| `ignore_unavailable` | Boolean | When `false`, the request returns an error when it targets a missing or closed index. Default is `false`.
| `allow_no_indices` | Boolean | When `false`, the Refresh Index API returns an error when a wildcard expression, index alias, or `_all` targets only closed or missing indexes, even when the request is made against open indexes. Default is `true`. |
| `expand_wildcards` | String | The type of index that the wildcard patterns can match. If the request targets data streams, this argument determines whether the wildcard expressions match any hidden data streams. Supports comma-separated values, such as `open,hidden`. Valid values are `all`, `open`, `closed`, `hidden`, and `none`. |
`cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`.
`timeout` | Time | The amount of time to wait for the request to return. Default is `30s`. |

## Example request

The following example request disables any `write` operations made to the test index:

```json
PUT /test-index/_block/write
```

## Example response

```json
{
"acknowledged" : true,
"shards_acknowledged" : true,
"indices" : [ {
"name" : "test-index",
"blocked" : true
} ]
}
```
Loading