Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace 'ent-search-generic' with 'search-default' pipeline #118899

Merged
merged 3 commits into from
Dec 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .java-version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 on committing .java-version to the repo. Can you fix this please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof, sorry

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix here: #118971

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
21
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The logic for content extraction is defined in {connectors-python}/connectors/ut
While intended primarily for PDF and Microsoft Office formats, you can use any of the <<es-connectors-content-extraction-supported-file-types, supported formats>>.

Enterprise Search uses an {ref}/ingest.html[Elasticsearch ingest pipeline^] to power the web crawler's binary content extraction.
The default pipeline, `ent-search-generic-ingestion`, is automatically created when Enterprise Search first starts.
The default pipeline, `search-default-ingestion`, is automatically created when Enterprise Search first starts.

You can {ref}/ingest.html#create-manage-ingest-pipelines[view^] this pipeline in Kibana.
Customizing your pipeline usage is also an option.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ The following diagram provides an overview of how content extraction, sync rules
[.screenshot]
image::images/pipelines-extraction-sync-rules.png[Architecture diagram of data pipeline with content extraction, sync rules, and ingest pipelines]

By default, only the connector specific logic (2) and the default `ent-search-generic-ingestion` pipeline (6) extract and transform your data, as configured in your deployment.
By default, only the connector specific logic (2) and the default `search-default-ingestion` pipeline (6) extract and transform your data, as configured in your deployment.

The following tools are available for more advanced use cases:

Expand Down Expand Up @@ -50,4 +50,4 @@ Use ingest pipelines for data enrichment, normalization, and more.

Elastic connectors use a default ingest pipeline, which you can copy and customize to meet your needs.

Refer to {ref}/ingest-pipeline-search.html[ingest pipelines in Search] in the {es} documentation.
Refer to {ref}/ingest-pipeline-search.html[ingest pipelines in Search] in the {es} documentation.
2 changes: 1 addition & 1 deletion docs/reference/ingest/search-inference-processing.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ The `monitor_ml` <<security-privileges, Elasticsearch cluster privilege>> is req

To create the index-specific ML inference pipeline, go to *Search -> Content -> Indices -> <your index> -> Pipelines* in the Kibana UI.

If you only see the `ent-search-generic-ingestion` pipeline, you will need to click *Copy and customize* to create index-specific pipelines.
If you only see the `search-default-ingestion` pipeline, you will need to click *Copy and customize* to create index-specific pipelines.
This will create the `{index_name}@ml-inference` pipeline.

Once your index-specific ML inference pipeline is ready, you can add inference processors that use your ML trained models.
Expand Down
33 changes: 18 additions & 15 deletions docs/reference/ingest/search-ingest-pipelines.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Considerations such as error handling, conditional execution, sequencing, versio
To this end, when you create indices for search use cases, (including {enterprise-search-ref}/crawler.html[Elastic web crawler], <<es-connectors,connectors>>.
, and API indices), each index already has a pipeline set up with several processors that optimize your content for search.

This pipeline is called `ent-search-generic-ingestion`.
This pipeline is called `search-default-ingestion`.
While it is a "managed" pipeline (meaning it should not be tampered with), you can view its details via the Kibana UI or the Elasticsearch API.
You can also <<ingest-pipeline-search-details-generic-reference,read more about its contents below>>.

Expand All @@ -56,14 +56,14 @@ This will not effect existing indices.

Each index also provides the capability to easily create index-specific ingest pipelines with customizable processing.
If you need that extra flexibility, you can create a custom pipeline by going to your pipeline settings and choosing to "copy and customize".
This will replace the index's use of `ent-search-generic-ingestion` with 3 newly generated pipelines:
This will replace the index's use of `search-default-ingestion` with 3 newly generated pipelines:

1. `<index-name>`
2. `<index-name>@custom`
3. `<index-name>@ml-inference`

Like `ent-search-generic-ingestion`, the first of these is "managed", but the other two can and should be modified to fit your needs.
You can view these pipelines using the platform tools (Kibana UI, Elasticsearch API), and can also
Like `search-default-ingestion`, the first of these is "managed", but the other two can and should be modified to fit your needs.
You can view these pipelines using the platform tools (Kibana UI, Elasticsearch API), and can also
<<ingest-pipeline-search-details-specific,read more about their content below>>.

[discrete#ingest-pipeline-search-pipeline-settings]
Expand Down Expand Up @@ -123,7 +123,7 @@ If the pipeline is not specified, the underscore-prefixed fields will actually b
=== Details

[discrete#ingest-pipeline-search-details-generic-reference]
==== `ent-search-generic-ingestion` Reference
==== `search-default-ingestion` Reference

You can access this pipeline with the <<get-pipeline-api, Elasticsearch Ingest Pipelines API>> or via Kibana's <<create-manage-ingest-pipelines,Stack Management > Ingest Pipelines>> UI.

Expand All @@ -149,7 +149,7 @@ If you want to make customizations, we recommend you utilize index-specific pipe
[discrete#ingest-pipeline-search-details-generic-reference-params]
===== Control flow parameters

The `ent-search-generic-ingestion` pipeline does not always run all processors.
The `search-default-ingestion` pipeline does not always run all processors.
It utilizes a feature of ingest pipelines to <<conditionally-run-processor,conditionally run processors>> based on the contents of each individual document.

* `_extract_binary_content` - if this field is present and has a value of `true` on a source document, the pipeline will attempt to run the `attachment`, `set_body`, and `remove_replacement_chars` processors.
Expand All @@ -167,8 +167,8 @@ See <<ingest-pipeline-search-pipeline-settings>>.
==== Index-specific ingest pipelines

In the Kibana UI for your index, by clicking on the Pipelines tab, then *Settings > Copy and customize*, you can quickly generate 3 pipelines which are specific to your index.
These 3 pipelines replace `ent-search-generic-ingestion` for the index.
There is nothing lost in this action, as the `<index-name>` pipeline is a superset of functionality over the `ent-search-generic-ingestion` pipeline.
These 3 pipelines replace `search-default-ingestion` for the index.
There is nothing lost in this action, as the `<index-name>` pipeline is a superset of functionality over the `search-default-ingestion` pipeline.

[IMPORTANT]
====
Expand All @@ -179,7 +179,7 @@ Refer to the Elastic subscriptions pages for https://www.elastic.co/subscription
[discrete#ingest-pipeline-search-details-specific-reference]
===== `<index-name>` Reference

This pipeline looks and behaves a lot like the <<ingest-pipeline-search-details-generic-reference,`ent-search-generic-ingestion` pipeline>>, but with <<ingest-pipeline-search-details-specific-reference-processors,two additional processors>>.
This pipeline looks and behaves a lot like the <<ingest-pipeline-search-details-generic-reference,`search-default-ingestion` pipeline>>, but with <<ingest-pipeline-search-details-specific-reference-processors,two additional processors>>.

[WARNING]
=========================
Expand All @@ -197,7 +197,7 @@ If you want to make customizations, we recommend you utilize <<ingest-pipeline-s
[discrete#ingest-pipeline-search-details-specific-reference-processors]
====== Processors

In addition to the processors inherited from the <<ingest-pipeline-search-details-generic-reference,`ent-search-generic-ingestion` pipeline>>, the index-specific pipeline also defines:
In addition to the processors inherited from the <<ingest-pipeline-search-details-generic-reference,`search-default-ingestion` pipeline>>, the index-specific pipeline also defines:

* `index_ml_inference_pipeline` - this uses the <<pipeline-processor, Pipeline>> processor to run the `<index-name>@ml-inference` pipeline.
This processor will only be run if the source document includes a `_run_ml_inference` field with the value `true`.
Expand All @@ -206,7 +206,7 @@ In addition to the processors inherited from the <<ingest-pipeline-search-detail
[discrete#ingest-pipeline-search-details-specific-reference-params]
====== Control flow parameters

Like the `ent-search-generic-ingestion` pipeline, the `<index-name>` pipeline does not always run all processors.
Like the `search-default-ingestion` pipeline, the `<index-name>` pipeline does not always run all processors.
In addition to the `_extract_binary_content` and `_reduce_whitespace` control flow parameters, the `<index-name>` pipeline also supports:

* `_run_ml_inference` - if this field is present and has a value of `true` on a source document, the pipeline will attempt to run the `index_ml_inference_pipeline` processor.
Expand All @@ -220,7 +220,7 @@ See <<ingest-pipeline-search-pipeline-settings>>.
===== `<index-name>@ml-inference` Reference

This pipeline is empty to start (no processors), but can be added to via the Kibana UI either through the Pipelines tab of your index, or from the *Stack Management > Ingest Pipelines* page.
Unlike the `ent-search-generic-ingestion` pipeline and the `<index-name>` pipeline, this pipeline is NOT "managed".
Unlike the `search-default-ingestion` pipeline and the `<index-name>` pipeline, this pipeline is NOT "managed".

It's possible to add one or more ML inference pipelines to an index in the *Content* UI.
This pipeline will serve as a container for all of the ML inference pipelines configured for the index.
Expand All @@ -241,7 +241,7 @@ The `monitor_ml` Elasticsearch cluster permission is required in order to manage

This pipeline is empty to start (no processors), but can be added to via the Kibana UI either through the Pipelines
tab of your index, or from the *Stack Management > Ingest Pipelines* page.
Unlike the `ent-search-generic-ingestion` pipeline and the `<index-name>` pipeline, this pipeline is NOT "managed".
Unlike the `search-default-ingestion` pipeline and the `<index-name>` pipeline, this pipeline is NOT "managed".

You are encouraged to make additions and edits to this pipeline, provided its name remains the same.
This provides a convenient hook from which to add custom processing and transformations for your data.
Expand Down Expand Up @@ -272,9 +272,12 @@ extraction.
These changes should be re-applied to each index's `<index-name>@custom` pipeline in order to ensure a consistent data processing experience.
In 8.5+, the <<ingest-pipeline-search-pipeline-settings, index setting to enable binary content>> is required *in addition* to the configurations mentioned in the {enterprise-search-ref}/crawler-managing.html#crawler-managing-binary-content[Elastic web crawler Guide].

* `ent-search-generic-ingestion` - Since 8.5, Native Connectors, Connector Clients, and new (>8.4) Elastic web crawler indices will all make use of this pipeline by default.
* `ent-search-generic-ingestion` - Since 8.5, Native Connectors, Connector Clients, and new (>8.4) Elastic web crawler indices all made use of this pipeline by default.
This pipeline evolved into the `search-default-ingestion` pipeline.

* `search-default-ingestion` - Since 9.0, Connectors have made use of this pipeline by default.
You can <<ingest-pipeline-search-details-generic-reference, read more about this pipeline>> above.
As this pipeline is "managed", any modifications that were made to `app_search_crawler` and/or `ent_search_crawler` should NOT be made to `ent-search-generic-ingestion`.
As this pipeline is "managed", any modifications that were made to `app_search_crawler` and/or `ent_search_crawler` should NOT be made to `search-default-ingestion`.
Instead, if such customizations are desired, you should utilize <<ingest-pipeline-search-details-specific>>, placing all modifications in the `<index-name>@custom` pipeline(s).
=============

Expand Down
4 changes: 2 additions & 2 deletions docs/reference/ingest/search-nlp-tutorial.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -164,8 +164,8 @@ Now it's time to create an inference pipeline.

1. From the overview page for your `search-photo-comments` index in "Search", click the *Pipelines* tab.
By default, Elasticsearch does not create any index-specific ingest pipelines.
2. Because we want to customize these pipelines, we need to *Copy and customize* the `ent-search-generic-ingestion` ingest pipeline.
Find this option above the settings for the `ent-search-generic-ingestion` ingest pipeline.
2. Because we want to customize these pipelines, we need to *Copy and customize* the `search-default-ingestion` ingest pipeline.
Find this option above the settings for the `search-default-ingestion` ingest pipeline.
This will create two new index-specific ingest pipelines.

Next, we'll add an inference pipeline.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"dynamic": "false",
"_meta": {
"pipeline": {
"default_name": "ent-search-generic-ingestion",
"default_name": "search-default-ingestion",
"default_extract_binary_content": true,
"default_run_ml_inference": true,
"default_reduce_whitespace": true
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,6 @@ public class ConnectorTemplateRegistry extends IndexTemplateRegistry {
public static final String ACCESS_CONTROL_TEMPLATE_NAME = "search-acl-filter";

// Pipeline constants

public static final String ENT_SEARCH_GENERIC_PIPELINE_NAME = "ent-search-generic-ingestion";
public static final String ENT_SEARCH_GENERIC_PIPELINE_FILE = "generic_ingestion_pipeline";

public static final String SEARCH_DEFAULT_PIPELINE_NAME = "search-default-ingestion";
public static final String SEARCH_DEFAULT_PIPELINE_FILE = "search_default_pipeline";

Expand Down Expand Up @@ -109,12 +105,6 @@ public class ConnectorTemplateRegistry extends IndexTemplateRegistry {
@Override
protected List<IngestPipelineConfig> getIngestPipelines() {
return List.of(
new JsonIngestPipelineConfig(
ENT_SEARCH_GENERIC_PIPELINE_NAME,
ROOT_RESOURCE_PATH + ENT_SEARCH_GENERIC_PIPELINE_FILE + ".json",
REGISTRY_VERSION,
TEMPLATE_VERSION_VARIABLE
),
new JsonIngestPipelineConfig(
SEARCH_DEFAULT_PIPELINE_NAME,
ROOT_RESOURCE_PATH + SEARCH_DEFAULT_PIPELINE_FILE + ".json",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ public void testToXContent() throws IOException {
String content = XContentHelper.stripWhitespace("""
{
"extract_binary_content": true,
"name": "ent-search-generic-ingestion",
"name": "search-default-ingestion",
"reduce_whitespace": true,
"run_ml_inference": false
}
Expand Down
Loading
Loading