Merge branch '8.17' into release-notes-8-17-0

elastic · Dec 12, 2024 · 2763bfd · 2763bfd
2 parents a75acea + cdff361
commit 2763bfd
Show file tree

Hide file tree

Showing 24 changed files with 3,098 additions and 76 deletions.
diff --git a/docs/changelog/118380.yaml b/docs/changelog/118380.yaml
@@ -0,0 +1,5 @@
+pr: 118380
+summary: Restore original "is within leaf" value in `SparseVectorFieldMapper`
+area: Mapping
+type: bug
+issues: []
diff --git a/docs/changelog/118559.yaml b/docs/changelog/118559.yaml
@@ -0,0 +1,11 @@
+pr: 118559
+summary: Make logsdb general available
+area: Logs
+type: feature
+issues: []
+highlight:
+  title: Make logsdb general available
+  body: >-
+    Logsdb has been GA-ed. Logsdb is a feature that allows Elasticsearch to store logs more efficiently.
+    Logsdb allows to reduce storage usage upto ~2.5 times compared to storing logs in Elasticsearch without Logsdb.
+  notable: true
diff --git a/docs/reference/connector/docs/connectors-salesforce.asciidoc b/docs/reference/connector/docs/connectors-salesforce.asciidoc
@@ -200,7 +200,7 @@ Once the permissions are set, assign the Profiles, Permission Set or Permission
 Follow these steps in Salesforce:
 
 1. Navigate to `Administration` under the `Users` section.
-2. Select `Users` and choose the user to set the permissions to. 
+2. Select `Users` and choose the user to set the permissions to.
 3. Set the `Profile`, `Permission Set` or `Permission Set Groups` created in the earlier steps.
 
 [discrete#es-connectors-salesforce-sync-rules]
@@ -249,7 +249,7 @@ Allowed values are *SOQL* and *SOSL*.
 [
   {
     "query": "FIND {Salesforce} IN ALL FIELDS",
-    "language": "SOSL" 
+    "language": "SOSL"
   }
 ]
 ----
@@ -381,7 +381,13 @@ See <<es-connectors-content-extraction,content extraction>> for more specifics o
 [discrete#es-connectors-salesforce-known-issues]
 ===== Known issues
 
-There are currently no known issues for this connector.
+* *DLS feature is "type-level" not "document-level"*
++
+Salesforce DLS, added in 8.13.0, does not accomodate specific access controls to specific Salesforce Objects.
+Instead, if a given user/group can have access to _any_ Objects of a given type (`Case`, `Lead`, `Opportunity`, etc), that user/group will appear in the `\_allow_access_control` list for _all_ of the Objects of that type.
+See https://github.com/elastic/connectors/issues/3028 for more details.
++
+
 Refer to <<es-connectors-known-issues,connector known issues>> for a list of known issues for all connectors.
 
 [discrete#es-connectors-salesforce-security]
@@ -396,7 +402,7 @@ This connector is built with the {connectors-python}[Elastic connector framework
 
 View the {connectors-python}/connectors/sources/salesforce.py[source code for this connector^] (branch _{connectors-branch}_, compatible with Elastic _{minor-version}_).
 
-// Closing the collapsible section 
+// Closing the collapsible section
 ===============
 
 
@@ -598,7 +604,7 @@ Once the permissions are set, assign the Profiles, Permission Set or Permission
 Follow these steps in Salesforce:
 
 1. Navigate to `Administration` under the `Users` section.
-2. Select `Users` and choose the user to set the permissions to. 
+2. Select `Users` and choose the user to set the permissions to.
 3. Set the `Profile`, `Permission Set` or `Permission Set Groups` created in the earlier steps.
 
 [discrete#es-connectors-salesforce-client-sync-rules]
@@ -648,7 +654,7 @@ Allowed values are *SOQL* and *SOSL*.
 [
   {
     "query": "FIND {Salesforce} IN ALL FIELDS",
-    "language": "SOSL" 
+    "language": "SOSL"
   }
 ]
 ----
@@ -781,7 +787,13 @@ See <<es-connectors-content-extraction,content extraction>> for more specifics o
 [discrete#es-connectors-salesforce-client-known-issues]
 ===== Known issues
 
-There are currently no known issues for this connector.
+* *DLS feature is "type-level" not "document-level"*
++
+Salesforce DLS, added in 8.13.0, does not accomodate specific access controls to specific Salesforce Objects.
+Instead, if a given user/group can have access to _any_ Objects of a given type (`Case`, `Lead`, `Opportunity`, etc), that user/group will appear in the `\_allow_access_control` list for _all_ of the Objects of that type.
+See https://github.com/elastic/connectors/issues/3028 for more details.
++
+
 Refer to <<es-connectors-known-issues,connector known issues>> for a list of known issues for all connectors.
 
 [discrete#es-connectors-salesforce-client-security]
@@ -797,5 +809,5 @@ This connector is built with the {connectors-python}[Elastic connector framework
 View the {connectors-python}/connectors/sources/salesforce.py[source code for this connector^] (branch _{connectors-branch}_, compatible with Elastic _{minor-version}_).
 
 
-// Closing the collapsible section 
+// Closing the collapsible section
 ===============
diff --git a/docs/reference/data-streams/logs.asciidoc b/docs/reference/data-streams/logs.asciidoc
@@ -1,26 +1,20 @@
 [[logs-data-stream]]
 == Logs data stream
 
-preview::[Logs data streams and the logsdb index mode are in tech preview and may be changed or removed in the future. Don't use logs data streams or logsdb index mode in production.]
+IMPORTANT: The {es} `logsdb` index mode is generally available in Elastic Cloud Hosted 
+and self-managed Elasticsearch as of version 8.17, and is enabled by default for 
+logs in https://www.elastic.co/elasticsearch/serverless[{serverless-full}]. 
 
 A logs data stream is a data stream type that stores log data more efficiently.
 
 In benchmarks, log data stored in a logs data stream used ~2.5 times less disk space than a regular data
-stream. The exact impact will vary depending on your data set.
-
-The following features are enabled in a logs data stream:
-
-* <<synthetic-source,Synthetic source>>, which omits storing the `_source` field. When the document source is requested, it is synthesized from document fields upon retrieval.
-
-* Index sorting. This yields a lower storage footprint. By default indices are sorted by `host.name` and `@timestamp` fields at index time.
-
-* More space efficient compression for fields with <<doc-values,`doc_values`>> enabled.
+stream. The exact impact varies by data set.
 
 [discrete]
 [[how-to-use-logsds]]
 === Create a logs data stream
 
-To create a logs data stream, set your index template  `index.mode` to `logsdb`:
+To create a logs data stream, set your <<index-templates,template>> `index.mode` to `logsdb`:
 
 [source,console]
 ----
@@ -39,14 +33,193 @@ PUT _index_template/my-index-template
 // TEST
 
 <1> The index mode setting.
-<2> The index template priority. By default, Elasticsearch ships with an index template with a `logs-*-*` pattern with a priority of 100. You need to define a priority higher than 100 to ensure that this index template gets selected over the default index template for the `logs-*-*` pattern. See the <<avoid-index-pattern-collisions,avoid index pattern collision section>> for more information.
+<2> The index template priority. By default, Elasticsearch ships with a `logs-*-*` index template with a priority of 100. To make sure your index template takes priority over the default `logs-*-*` template, set its `priority` to a number higher than 100. For more information, see <<avoid-index-pattern-collisions,Avoid index pattern collisions>>.
 
 After the index template is created, new indices that use the template will be configured as a logs data stream. You can start indexing data and <<use-a-data-stream,using the data stream>>.
 
+You can also set the index mode and adjust other template settings in <<index-mgmt,the Elastic UI>>.
+
 ////
 [source,console]
 ----
 DELETE _index_template/my-index-template
 ----
 // TEST[continued]
 ////
+
+[[logsdb-default-settings]]
+
+[discrete]
+[[logsdb-synthetic-source]]
+=== Synthetic source
+
+If you have the required https://www.elastic.co/subscriptions[subscription], `logsdb` index mode uses <<synthetic-source,synthetic `_source`>>, which omits storing the original `_source`
+field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval. 
+
+If you don't have the required https://www.elastic.co/subscriptions[subscription], `logsdb` mode uses the original `_source` field.
+
+Before using synthetic source, make sure to review the <<synthetic-source-restrictions,restrictions>>. 
+
+When working with multi-value fields, the `index.mapping.synthetic_source_keep` setting controls how field values
+are preserved for <<synthetic-source,synthetic source>> reconstruction. In `logsdb`, the default value is `arrays`,
+which retains both duplicate values and the order of entries. However, the exact structure of
+array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some 
+log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events. 
+
+[discrete]
+[[logsdb-sort-settings]]
+=== Index sort settings
+
+In `logsdb` index mode, the following sort settings are applied by default:
+
+`index.sort.field`: `["host.name", "@timestamp"]`::
+Indices are sorted by `host.name` and `@timestamp` by default. The `@timestamp` field is automatically injected if it is not present.
+
+`index.sort.order`: `["desc", "desc"]`::
+Both `host.name` and `@timestamp` are sorted in descending (`desc`) order, prioritizing the latest data.
+
+`index.sort.mode`: `["min", "min"]`::
+The `min` mode sorts indices by the minimum value of multi-value fields.
+
+`index.sort.missing`: `["_first", "_first"]`::
+Missing values are sorted to appear `_first`.
+
+You can override these default sort settings. For example, to sort on different fields
+and change the order, manually configure `index.sort.field` and `index.sort.order`. For more details, see
+<<index-modules-index-sorting>>.
+
+When using the default sort settings, the `host.name` field is automatically injected into the index mappings as a `keyword` field to ensure that sorting can be applied. This guarantees that logs are efficiently sorted and retrieved based on the `host.name` and `@timestamp` fields.
+
+NOTE: If `subobjects` is set to `true` (default), the `host` field is mapped as an object field
+named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`,
+a single `host.name` field is mapped as a `keyword` field.
+
+To apply different sort settings to an existing data stream, update the data stream's component templates, and then 
+perform or wait for a <<data-streams-rollover,rollover>>.
+
+NOTE: In `logsdb` mode, the `@timestamp` field is automatically injected if it's not already present. If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not
+automatically added to the list of sort fields.
+
+[discrete]
+[[logsdb-host-name]]
+==== Existing data streams
+
+If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied. 
+
+To avoid mapping conflicts, consider these options:
+
+* **Adjust mappings:** Check your existing mappings to ensure that `host.name` is mapped as a keyword.
+
+* **Change sorting:** If needed, you can remove `host.name` from the sort settings and use a different set of fields. Sorting by `@timestamp` can be a good fallback.
+
+* **Switch to a different <<index-mode-setting,index mode>>**: If resolving `host.name` mapping conflicts is not feasible, you can choose not to use `logsdb` mode.
+
+IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-rollover,rollover>> (automatic or manual). 
+
+[discrete]
+[[logsdb-specialized-codecs]]
+=== Specialized codecs
+
+By default, `logsdb` index mode uses the `best_compression` <<index-codec,codec>>, which applies {wikipedia}/Zstd[ZSTD]
+compression to stored fields. You can switch to the `default` codec for faster compression with a slightly larger storage footprint.
+
+The `logsdb` index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are 
+encoded using the following sequence of codecs:
+
+* **Delta encoding**:
+  Stores the difference between consecutive values instead of the actual values.
+
+* **Offset encoding**:
+  Stores the difference from a base value rather than between consecutive values.
+
+* **Greatest Common Divisor (GCD) encoding**:
+  Finds the greatest common divisor of a set of values and stores the differences as multiples of the GCD.
+
+* **Frame Of Reference (FOR) encoding**:
+  Determines the smallest number of bits required to encode a block of values and uses
+  bit-packing to fit such values into larger 64-bit blocks.
+
+Each encoding is evaluated according to heuristics determined by the data distribution.
+For example, the algorithm checks whether the data is monotonically non-decreasing or
+non-increasing. If so, delta encoding is applied; otherwise, the process
+continues with the next encoding method (offset).
+
+Encoding is specific to each Lucene segment and is reapplied when segments are merged. The merged Lucene segment
+might use a different encoding than the original segments, depending on the characteristics of the merged data.
+
+For keyword fields, **Run Length Encoding (RLE)** is applied to the ordinals, which represent positions in the Lucene
+segment-level keyword dictionary. This compression is used when multiple consecutive documents share the same keyword.
+
+[discrete]
+[[logsdb-ignored-settings]]
+=== `ignore` settings
+
+The `logsdb` index mode uses the following `ignore` settings. You can override these settings as needed.
+
+[discrete]
+[[logsdb-ignore-malformed]]
+==== `ignore_malformed`
+
+By default, `logsdb` index mode sets `ignore_malformed` to `true`. With this setting, documents with malformed fields
+can be indexed without causing ingestion failures.
+
+[discrete]
+[[logs-db-ignore-above]]
+==== `ignore_above`
+
+In `logsdb` index mode, the `index.mapping.ignore_above` setting is applied by default at the index level to ensure
+efficient storage and indexing of large keyword fields.The index-level default for `ignore_above` is 8191
+_characters._ Using UTF-8 encoding, this results in a limit of 32764 bytes, depending on character encoding.
+
+The mapping-level `ignore_above` setting takes precedence. If a specific field has an `ignore_above` value
+defined in its mapping, that value overrides the index-level `index.mapping.ignore_above` value. This default
+behavior helps to optimize indexing performance by preventing excessively large string values from being indexed. 
+
+If you need to customize the limit, you can override it at the mapping level or change the index level default. 
+
+[discrete]
+[[logs-db-ignore-limit]]
+==== `ignore_dynamic_beyond_limit`
+
+In `logsdb` index mode, the setting `index.mapping.total_fields.ignore_dynamic_beyond_limit` is set to `true` by
+default. This setting allows dynamically mapped fields to be added on top of statically defined fields, even when the total number of fields exceeds the `index.mapping.total_fields.limit`. Instead of triggering an index failure, additional dynamically mapped fields are ignored so that ingestion can continue.
+
+NOTE: When automatically injected, `host.name` and `@timestamp` count toward the limit of mapped fields. If `host.name` is mapped with `subobjects: true`, it has two fields. When mapped with `subobjects: false`, `host.name` has only one field.
+
+[discrete]
+[[logsdb-nodocvalue-fields]]
+=== Fields without `doc_values`
+
+When the `logsdb` index mode uses synthetic `_source` and `doc_values` are disabled for a field in the mapping,
+{es} might set the `store` setting to `true` for that field. This ensures that the field's
+data remains accessible for reconstructing the document's source when using
+<<synthetic-source,synthetic source>>.
+
+For example, this adjustment occurs with text fields when `store` is `false` and no suitable multi-field is available for
+reconstructing the original value.
+
+[discrete]
+[[logsdb-settings-summary]]
+=== Settings reference
+
+The `logsdb` index mode uses the following settings: 
+
+* **`index.mode`**: `"logsdb"`
+
+* **`index.mapping.synthetic_source_keep`**: `"arrays"`
+
+* **`index.sort.field`**: `["host.name", "@timestamp"]`
+
+* **`index.sort.order`**: `["desc", "desc"]`
+
+* **`index.sort.mode`**: `["min", "min"]`
+
+* **`index.sort.missing`**: `["_first", "_first"]`
+
+* **`index.codec`**: `"best_compression"`
+
+* **`index.mapping.ignore_malformed`**: `true`
+
+* **`index.mapping.ignore_above`**: `8191`
+
+* **`index.mapping.total_fields.ignore_dynamic_beyond_limit`**: `true`
diff --git a/docs/reference/data-streams/tsds.asciidoc b/docs/reference/data-streams/tsds.asciidoc
@@ -17,7 +17,7 @@ metrics data. Only use a TSDS if you typically add metrics data to {es} in near
 real-time and `@timestamp` order.
 
 A TSDS is only intended for metrics data. For other timestamped data, such as
-logs or traces, use a regular data stream.
+logs or traces, use a <<logs-data-stream,logs data stream>> or regular data stream.
 
 [discrete]
 [[differences-from-regular-data-stream]]

diff --git a/docs/reference/images/index-mgmt/management-data-stream-fields.png b/docs/reference/images/index-mgmt/management-data-stream-fields.png
diff --git a/docs/reference/images/index-mgmt/management-data-stream.png b/docs/reference/images/index-mgmt/management-data-stream.png
diff --git a/docs/reference/images/index-mgmt/management-index-templates.png b/docs/reference/images/index-mgmt/management-index-templates.png
diff --git a/docs/reference/index-modules.asciidoc b/docs/reference/index-modules.asciidoc
@@ -113,10 +113,9 @@ Index mode supports the following values:
 
 `standard`::: Standard indexing with default settings.
 
-`time_series`::: Index mode optimized for storage of metrics documented in <<tsds-index-settings,TSDS Settings>>.
+`tsds`::: _(data streams only)_ Index mode optimized for storage of metrics. For more information, see <<tsds-index-settings>>.
 
-`logsdb`::: Index mode optimized for storage of logs. It applies default sort settings on the `hostname` and `timestamp` fields and uses <<synthetic-source,synthetic `_source`>>. <<index-modules-index-sorting,Index sorting>> on different fields is still allowed.
-preview:[]
+`logsdb`::: _(data streams only)_ Index mode optimized for <<logs-data-stream,logs>>.
 
 [[routing-partition-size]] `index.routing_partition_size`::
 

diff --git a/docs/reference/indices/index-mgmt.asciidoc b/docs/reference/indices/index-mgmt.asciidoc
@@ -67,7 +67,7 @@ This value is the time period for which your data is guaranteed to be stored. Da
 Elasticsearch at a later time.
 
 [role="screenshot"]
-image::images/index-mgmt/management-data-stream.png[Data stream details]
+image::images/index-mgmt/management-data-stream-fields.png[Data stream details]
 
 * To view more information about a data stream, such as its generation or its
 current index lifecycle policy, click the stream's name. From this view, you can navigate to *Discover* to

diff --git a/docs/reference/indices/put-index-template.asciidoc b/docs/reference/indices/put-index-template.asciidoc
@@ -115,10 +115,10 @@ See <<create-index-template,create an index template>>.
 
 `index_mode`::
 (Optional, string) Type of data stream to create. Valid values are `null`
-(regular data stream) and `time_series` (<<tsds,time series data stream>>).
+(standard data stream), `time_series` (<<tsds,time series data stream>>) and `logsdb` 
+(<<logs-data-stream,logs data stream>>).
 +
-If `time_series`, each backing index has an `index.mode` index setting of
-`time_series`.
+The template's `index_mode` sets the `index.mode` of the backing index.
 =====
 
 `index_patterns`::