Skip to content

Commit

Permalink
Merge branch 'main' into rag
Browse files Browse the repository at this point in the history
  • Loading branch information
leemthompo committed Jan 7, 2025
2 parents ca54e5b + 56becc5 commit 2042bd2
Show file tree
Hide file tree
Showing 320 changed files with 7,031 additions and 5,509 deletions.

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
import org.gradle.api.artifacts.Configuration;
import org.gradle.api.artifacts.dsl.DependencyHandler;
import org.gradle.api.artifacts.type.ArtifactTypeDefinition;
import org.gradle.api.file.FileCollection;
import org.gradle.api.plugins.JavaPluginExtension;
import org.gradle.api.provider.Provider;
import org.gradle.api.specs.Specs;
Expand Down Expand Up @@ -88,8 +89,8 @@ public void apply(Project project) {
Map<String, TaskProvider<?>> versionTasks = versionTasks(project, "destructiveDistroUpgradeTest", buildParams.getBwcVersions());
TaskProvider<Task> destructiveDistroTest = project.getTasks().register("destructiveDistroTest");

Configuration examplePlugin = configureExamplePlugin(project);

Configuration examplePluginConfiguration = configureExamplePlugin(project);
FileCollection examplePluginFileCollection = examplePluginConfiguration;
List<TaskProvider<Test>> windowsTestTasks = new ArrayList<>();
Map<ElasticsearchDistributionType, List<TaskProvider<Test>>> linuxTestTasks = new HashMap<>();

Expand All @@ -102,9 +103,9 @@ public void apply(Project project) {
t2 -> distribution.isDocker() == false || dockerSupport.get().getDockerAvailability().isAvailable()
);
addDistributionSysprop(t, DISTRIBUTION_SYSPROP, distribution::getFilepath);
addDistributionSysprop(t, EXAMPLE_PLUGIN_SYSPROP, () -> examplePlugin.getSingleFile().toString());
addDistributionSysprop(t, EXAMPLE_PLUGIN_SYSPROP, () -> examplePluginFileCollection.getSingleFile().toString());
t.exclude("**/PackageUpgradeTests.class");
}, distribution, examplePlugin.getDependencies());
}, distribution, examplePluginConfiguration.getDependencies());

if (distribution.getPlatform() == Platform.WINDOWS) {
windowsTestTasks.add(destructiveTask);
Expand Down
2 changes: 1 addition & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -365,7 +365,7 @@ tasks.register("verifyBwcTestsEnabled") {

tasks.register("branchConsistency") {
description = 'Ensures this branch is internally consistent. For example, that versions constants match released versions.'
group 'Verification'
group = 'Verification'
dependsOn ":verifyVersions", ":verifyBwcTestsEnabled"
}

Expand Down
5 changes: 3 additions & 2 deletions distribution/docker/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ if (useDra == false) {
ivy {
name = 'beats'
if (useLocalArtifacts) {
url getLayout().getBuildDirectory().dir("artifacts").get().asFile
url = getLayout().getBuildDirectory().dir("artifacts").get().asFile
patternLayout {
artifact '/[organisation]/[module]-[revision]-[classifier].[ext]'
}
Expand Down Expand Up @@ -127,7 +127,7 @@ ext.expansions = { Architecture architecture, DockerBase base ->
'bin_dir' : base == DockerBase.IRON_BANK ? 'scripts' : 'bin',
'build_date' : buildDate,
'config_dir' : base == DockerBase.IRON_BANK ? 'scripts' : 'config',
'git_revision' : buildParams.gitRevision,
'git_revision' : buildParams.gitRevision.get(),
'license' : base == DockerBase.IRON_BANK ? 'Elastic License 2.0' : 'Elastic-License-2.0',
'package_manager' : base.packageManager,
'docker_base' : base.name().toLowerCase(),
Expand Down Expand Up @@ -551,6 +551,7 @@ subprojects { Project subProject ->
inputs.file("${parent.projectDir}/build/markers/${buildTaskName}.marker")
executable = 'docker'
outputs.file(tarFile)
outputs.doNotCacheIf("Build cache is disabled for export tasks") { true }
args "save",
"-o",
tarFile,
Expand Down
20 changes: 20 additions & 0 deletions docs/changelog/117519.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
pr: 117519
summary: Remove `data_frame_transforms` roles
area: Transform
type: breaking
issues: []
breaking:
title: Remove `data_frame_transforms` roles
area: Transform
details: >-
`data_frame_transforms_admin` and `data_frame_transforms_user` were deprecated in
Elasticsearch 7 and are being removed in Elasticsearch 9.
`data_frame_transforms_admin` is now `transform_admin`.
`data_frame_transforms_user` is now `transform_user`.
Users must call the `_update` API to replace the permissions on the Transform before the
Transform can be started.
impact: >-
Transforms created with either the `data_frame_transforms_admin` or the
`data_frame_transforms_user` role will fail to start. The Transform will remain
in a `stopped` state, and its health will be red while displaying permission failures.
notable: false
5 changes: 5 additions & 0 deletions docs/changelog/117949.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 117949
summary: Move `SlowLogFieldProvider` instantiation to node construction
area: Infra/Logging
type: bug
issues: []
15 changes: 15 additions & 0 deletions docs/changelog/118804.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
pr: 118804
summary: Add new experimental `rank_vectors` mapping for late-interaction second order
ranking
area: Vector Search
type: feature
issues: []
highlight:
title: Add new experimental `rank_vectors` mapping for late-interaction second order
ranking
body:
Late-interaction models are powerful rerankers. While their size and overall
cost doesn't lend itself for HNSW indexing, utilizing them as second order reranking
can provide excellent boosts in relevance. The new `rank_vectors` mapping allows for rescoring
over new and novel multi-vector late-interaction models like ColBERT or ColPali.
notable: true
6 changes: 6 additions & 0 deletions docs/changelog/119054.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 119054
summary: "[Security Solution] allows `kibana_system` user to manage .reindexed-v8-*\
\ Security Solution indices"
area: Authorization
type: enhancement
issues: []
5 changes: 5 additions & 0 deletions docs/changelog/119233.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 119233
summary: Fixing `GetDatabaseConfigurationAction` response serialization
area: Ingest Node
type: bug
issues: []
5 changes: 5 additions & 0 deletions docs/changelog/119474.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 119474
summary: "Add ES|QL cross-cluster query telemetry collection"
area: ES|QL
type: enhancement
issues: []
6 changes: 6 additions & 0 deletions docs/changelog/119476.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 119476
summary: Fix TopN row size estimate
area: ES|QL
type: bug
issues:
- 106956
5 changes: 5 additions & 0 deletions docs/changelog/119495.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 119495
summary: Add mapping for `event_name` for OTel logs
area: Data streams
type: enhancement
issues: []
5 changes: 5 additions & 0 deletions docs/changelog/119516.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 119516
summary: "Fix: do not let `_resolve/cluster` hang if remote is unresponsive"
area: Search
type: bug
issues: []
4 changes: 2 additions & 2 deletions docs/plugins/discovery-ec2.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ The `discovery-ec2` plugin can automatically set the `aws_availability_zone`
node attribute to the availability zone of each node. This node attribute
allows you to ensure that each shard has copies allocated redundantly across
multiple availability zones by using the
{ref}/modules-cluster.html#shard-allocation-awareness[Allocation Awareness]
{ref}/shard-allocation-awareness.html#[Allocation Awareness]
feature.

In order to enable the automatic definition of the `aws_availability_zone`
Expand Down Expand Up @@ -333,7 +333,7 @@ labelled as `Moderate` or `Low`.

* It is a good idea to distribute your nodes across multiple
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html[availability
zones] and use {ref}/modules-cluster.html#shard-allocation-awareness[shard
zones] and use {ref}/shard-allocation-awareness.html[shard
allocation awareness] to ensure that each shard has copies in more than one
availability zone.

Expand Down
3 changes: 1 addition & 2 deletions docs/reference/analysis.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@
--

_Text analysis_ is the process of converting unstructured text, like
the body of an email or a product description, into a structured format that's
optimized for search.
the body of an email or a product description, into a structured format that's <<full-text-search,optimized for search>>.

[discrete]
[[when-to-configure-analysis]]
Expand Down
8 changes: 8 additions & 0 deletions docs/reference/analysis/tokenizers.asciidoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
[[analysis-tokenizers]]
== Tokenizer reference

.Difference between {es} tokenization and neural tokenization
[NOTE]
====
{es}'s tokenization process produces linguistic tokens, optimized for search and retrieval.
This differs from neural tokenization in the context of machine learning and natural language processing. Neural tokenizers translate strings into smaller, subword tokens, which are encoded into vectors for consumptions by neural networks.
{es} does not have built-in neural tokenizers.
====

A _tokenizer_ receives a stream of characters, breaks it up into individual
_tokens_ (usually individual words), and outputs a stream of _tokens_. For
instance, a <<analysis-whitespace-tokenizer,`whitespace`>> tokenizer breaks
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/cat/nodeattrs.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ console. They are _not_ intended for use by applications. For application
consumption, use the <<cluster-nodes-info,nodes info API>>.
====

Returns information about <<shard-allocation-filtering,custom node attributes>>.
Returns information about <<custom-node-attributes,custom node attributes>>.

[[cat-nodeattrs-api-request]]
==== {api-request-title}
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/cluster.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ one of the following:
master-eligible nodes, all data nodes, all ingest nodes, all voting-only
nodes, all machine learning nodes, and all coordinating-only nodes.
* a pair of patterns, using `*` wildcards, of the form `attrname:attrvalue`,
which adds to the subset all nodes with a custom node attribute whose name
which adds to the subset all nodes with a <<custom-node-attributes,custom node attribute>> whose name
and value match the respective patterns. Custom node attributes are
configured by setting properties in the configuration file of the form
`node.attr.attrname: attrvalue`.
Expand Down
7 changes: 5 additions & 2 deletions docs/reference/cluster/stats.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ Returns cluster statistics.

* If the {es} {security-features} are enabled, you must have the `monitor` or
`manage` <<privileges-list-cluster,cluster privilege>> to use this API.

[[cluster-stats-api-desc]]
==== {api-description-title}

Expand Down Expand Up @@ -1397,7 +1396,7 @@ as a human-readable string.
`_search`:::
(object) Contains the information about the <<modules-cross-cluster-search, {ccs}>> usage in the cluster.
(object) Contains information about <<modules-cross-cluster-search, {ccs}>> usage.
+
.Properties of `_search`
[%collapsible%open]
Expand Down Expand Up @@ -1528,7 +1527,11 @@ This may include requests where partial results were returned, but not requests
=======


======
`_esql`:::
(object) Contains information about <<esql-cross-clusters,{esql} {ccs}>> usage.
The structure of the object is the same as the `_search` object above.
=====

Expand Down
4 changes: 2 additions & 2 deletions docs/reference/commands/node-tool.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ bin/elasticsearch-node repurpose|unsafe-bootstrap|detach-cluster|override-versio
This tool has a number of modes:

* `elasticsearch-node repurpose` can be used to delete unwanted data from a
node if it used to be a <<data-node,data node>> or a
<<master-node,master-eligible node>> but has been repurposed not to have one
node if it used to be a <<data-node-role,data node>> or a
<<master-node-role,master-eligible node>> but has been repurposed not to have one
or other of these roles.

* `elasticsearch-node remove-settings` can be used to remove persistent settings
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/data-management.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Data older than this period can be deleted by {es} at a later time.

**Elastic Curator** is a tool that allows you to manage your indices and snapshots using user-defined filters and predefined actions. If ILM provides the functionality to manage your index lifecycle, and you have at least a Basic license, consider using ILM in place of Curator. Many stack components make use of ILM by default. {curator-ref-current}/ilm.html[Learn more].

NOTE: <<xpack-rollup,Data rollup>> is a deprecated Elasticsearch feature that allows you to manage the amount of data that is stored in your cluster, similar to the downsampling functionality of {ilm-init} and data stream lifecycle. This feature should not be used for new deployments.
NOTE: <<xpack-rollup,Data rollup>> is a deprecated {es} feature that allows you to manage the amount of data that is stored in your cluster, similar to the downsampling functionality of {ilm-init} and data stream lifecycle. This feature should not be used for new deployments.

[TIP]
====
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
[[migrate-index-allocation-filters]]
== Migrate index allocation filters to node roles

If you currently use custom node attributes and
If you currently use <<custom-node-attributes,custom node attributes>> and
<<shard-allocation-filtering, attribute-based allocation filters>> to
move indices through <<data-tiers, data tiers>> in a
https://www.elastic.co/blog/implementing-hot-warm-cold-in-elasticsearch-with-index-lifecycle-management[hot-warm-cold architecture],
Expand Down
8 changes: 7 additions & 1 deletion docs/reference/data-store-architecture.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,16 @@ from any node.
The topics in this section provides information about the architecture of {es} and how it stores and retrieves data:

* <<nodes-shards,Nodes and shards>>: Learn about the basic building blocks of an {es} cluster, including nodes, shards, primaries, and replicas.
* <<node-roles-overview,Node roles>>: Learn about the different roles that nodes can have in an {es} cluster.
* <<docs-replication,Reading and writing documents>>: Learn how {es} replicates read and write operations across shards and shard copies.
* <<shard-allocation-relocation-recovery,Shard allocation, relocation, and recovery>>: Learn how {es} allocates and balances shards across nodes.
** <<shard-allocation-awareness,Shard allocation awareness>>: Learn how to use custom node attributes to distribute shards across different racks or availability zones.
* <<shard-request-cache,Shard request cache>>: Learn how {es} caches search requests to improve performance.
--
include::nodes-shards.asciidoc[]
include::node-roles.asciidoc[]
include::docs/data-replication.asciidoc[leveloffset=-1]
include::modules/shard-ops.asciidoc[]
include::modules/shard-ops.asciidoc[]
include::modules/cluster/allocation_awareness.asciidoc[leveloffset=+1]
include::shard-request-cache.asciidoc[leveloffset=-1]
39 changes: 39 additions & 0 deletions docs/reference/data-streams/downsampling.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,45 @@ the granularity of `cold` archival data to monthly or less.
.Downsampled metrics series
image::images/data-streams/time-series-downsampled.png[align="center"]

[discrete]
[[downsample-api-process]]
==== The downsampling process

The downsampling operation traverses the source TSDS index and performs the
following steps:

. Creates a new document for each value of the `_tsid` field and each
`@timestamp` value, rounded to the `fixed_interval` defined in the downsample
configuration.
. For each new document, copies all <<time-series-dimension,time
series dimensions>> from the source index to the target index. Dimensions in a
TSDS are constant, so this is done only once per bucket.
. For each <<time-series-metric,time series metric>> field, computes aggregations
for all documents in the bucket. Depending on the metric type of each metric
field a different set of pre-aggregated results is stored:

** `gauge`: The `min`, `max`, `sum`, and `value_count` are stored; `value_count`
is stored as type `aggregate_metric_double`.
** `counter`: The `last_value` is stored.
. For all other fields, the most recent value is copied to the target index.

[discrete]
[[downsample-api-mappings]]
==== Source and target index field mappings

Fields in the target, downsampled index are created based on fields in the
original source index, as follows:

. All fields mapped with the `time-series-dimension` parameter are created in
the target downsample index with the same mapping as in the source index.
. All fields mapped with the `time_series_metric` parameter are created
in the target downsample index with the same mapping as in the source
index. An exception is that for fields mapped as `time_series_metric: gauge`
the field type is changed to `aggregate_metric_double`.
. All other fields that are neither dimensions nor metrics (that is, label
fields), are created in the target downsample index with the same mapping
that they had in the source index.

[discrete]
[[running-downsampling]]
=== Running downsampling on time series data
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/datatiers.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ tier].
[[configure-data-tiers-on-premise]]
==== Self-managed deployments

For self-managed deployments, each node's <<data-node,data role>> is configured
For self-managed deployments, each node's <<data-node-role,data role>> is configured
in `elasticsearch.yml`. For example, the highest-performance nodes in a cluster
might be assigned to both the hot and content tiers:

Expand Down
10 changes: 5 additions & 5 deletions docs/reference/high-availability/cluster-design.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ the same thing, but it's not necessary to use this feature in such a small
cluster.

We recommend you set only one of your two nodes to be
<<master-node,master-eligible>>. This means you can be certain which of your
<<master-node-role,master-eligible>>. This means you can be certain which of your
nodes is the elected master of the cluster. The cluster can tolerate the loss of
the other master-ineligible node. If you set both nodes to master-eligible, two
nodes are required for a master election. Since the election will fail if either
Expand Down Expand Up @@ -164,12 +164,12 @@ cluster that is suitable for production deployments.
[[high-availability-cluster-design-three-nodes]]
==== Three-node clusters

If you have three nodes, we recommend they all be <<data-node,data nodes>> and
If you have three nodes, we recommend they all be <<data-node-role,data nodes>> and
every index that is not a <<searchable-snapshots,searchable snapshot index>>
should have at least one replica. Nodes are data nodes by default. You may
prefer for some indices to have two replicas so that each node has a copy of
each shard in those indices. You should also configure each node to be
<<master-node,master-eligible>> so that any two of them can hold a master
<<master-node-role,master-eligible>> so that any two of them can hold a master
election without needing to communicate with the third node. Nodes are
master-eligible by default. This cluster will be resilient to the loss of any
single node.
Expand All @@ -188,8 +188,8 @@ service provides such a load balancer.

Once your cluster grows to more than three nodes, you can start to specialise
these nodes according to their responsibilities, allowing you to scale their
resources independently as needed. You can have as many <<data-node,data
nodes>>, <<ingest,ingest nodes>>, <<ml-node,{ml} nodes>>, etc. as needed to
resources independently as needed. You can have as many <<data-node-role,data
nodes>>, <<ingest,ingest nodes>>, <<ml-node-role,{ml} nodes>>, etc. as needed to
support your workload. As your cluster grows larger, we recommend using
dedicated nodes for each role. This allows you to independently scale resources
for each task.
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/ilm/apis/migrate-to-data-tiers.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
For the most up-to-date API details, refer to {api-es}/group/endpoint-ilm[{ilm-cap} APIs].
--

Switches the indices, ILM policies, and legacy, composable and component templates from using custom node attributes and
Switches the indices, ILM policies, and legacy, composable and component templates from using <<custom-node-attributes,custom node attributes>> and
<<shard-allocation-filtering, attribute-based allocation filters>> to using <<data-tiers, data tiers>>, and
optionally deletes one legacy index template.
Using node roles enables {ilm-init} to <<data-tier-migration, automatically move the indices>> between
Expand Down
Loading

0 comments on commit 2042bd2

Please sign in to comment.