Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First take on a comprehensive ingest guide #1373

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/en/ingest-arch/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ include::8-ls-input.asciidoc[]

include::99-airgapped.asciidoc[]

include::../ingest-guide/index.asciidoc[]

// === Next set of architectures
// include::3-schemamod.asciidoc[]
// include::6b-filebeat-es.asciidoc[]
Expand Down
19 changes: 19 additions & 0 deletions docs/en/ingest-guide/index.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
include::{docs-root}/shared/versions/stack/{source_branch}.asciidoc[]
include::{docs-root}/shared/attributes.asciidoc[]

:doctype: book

[[ingest-guide]]
= Elastic Ingest Overview

include::ingest-intro.asciidoc[]
include::ingest-tools.asciidoc[]
include::ingest-additional-proc.asciidoc[]
//include::ingest-static.asciidoc[]
//include::ingest-timestamped.asciidoc[]
include::ingest-solutions.asciidoc[]
//include::ingest-faq.asciidoc[]

//include:: Prereqs (for using data after ingest)
//include:: Migration for ingest
//include:: Troubleshooting
27 changes: 27 additions & 0 deletions docs/en/ingest-guide/ingest-additional-proc.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
[[ingest-addl-proc]]
== Additional ingest processing

You can start with {agent} and Elastic {integrations-docs}[integrations], and still
take advantage of additional processing options if you need them.

{agent} processors::
You can use link:{fleet-guide}/elastic-agent-processor-configuration.html[{agent} processors] to sanitize or enrich raw data at the source.
Use {agent} processors if you need to control what data is sent across the wire, or if you need to enrich the raw data with information available on the host.

{es} ingest pipelines::
You can use {es} link:{ref}/[ingest pipelines] to enrich incoming data or normalize field data before the data is indexed.
{es} ingest pipelines enable you to manipulate the data as it comes in.
This approach helps you avoid adding processing overhead to the hosts from which you're collecting data.

{es} runtime fields::
You can use {es} link:{ref}/runtime.html[runtime fields] to define or alter the schema at query time.
You can start working with your data without needing to understand how it is
structured, add fields to existing documents without reindexing your data,
override the value returned from an indexed field, and/or define fields for a
specific use without modifying the underlying schema.

{ls} `elastic_integration filter`::
You can use the {ls} link:{logstash-ref}/[`elastic_integration filter`] and
other link:{logstash-ref}/filter-plugins.html[{ls} filters] to
link:{logstash-ref}/ea-integrations.html[extend Elastic integrations] by
transforming data before it goes to {es}.
77 changes: 77 additions & 0 deletions docs/en/ingest-guide/ingest-faq.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
[[ingest-faq]]
== Frequently Asked Questions

Q: What Elastic products and tools are available for ingesting data into Elasticsearch.

Q: What's the best option for ingesting data?

Q: What's the role of Logstash `filter-elastic-integration`?



.WORK IN PROGRESS
****
Temporary parking lot to capture outstanding questions and notes.
****



Also cover (here or in general outline):

- https://www.elastic.co/guide/en/kibana/master/connect-to-elasticsearch.html#_add_sample_data[Sample data]
- OTel
- Beats
- Use case: GeoIP
- Airgapped
- Place for table, also adding use case + products (Exp: Logstash for multi-tenant)
- Role of LS in general content use cases



[discrete]
=== Questions to answer:

* Messaging for data sources that don't have an integration
- We're deemphasizing beats in preparation for deprecation
- We're not quite there with OTel yet
* How should we handle this in the near term?
Probably doesn't make sense to either ignore or jump them straight to Logstash

* Should we mention Fleet and Stand-alone agent?
** If so, when, where, and how?
* How does this relate to Ingest Architectures
* Enrichment for general content

* How to message current vs. desired state.
Especially Beats and OTel.
* HOW TO MESSAGE OTel - Current state. Future state.
* Consistent use of terminology vs. matching users' vocabulary (keywords)

[discrete]
==== Random

* DocsV3 - need for a sheltered space to develop new content
** Related: https://github.com/elastic/docsmobile/issues/708
** Need a place to incubate a new doc (previews, links, etc.)
** Refine messaging in private


[discrete]
=== Other resources to use, reference, reconcile

* Timeseries decision tree (needs updates)
* PM's video
** Needs an update. (We might relocate content before updating.)
* PM's product table
** Needs an update.(We might relocate content before updating.)
** Focuses on Agent over integrations.
** Same link text resolves to different locations.
** Proposal: Harvest the good and possibly repurpose the table format.
* Ingest Reference architectures
* Linkable content such as beats? Solutions ingest resources?

* https://www.elastic.co/guide/en/starting-with-the-elasticsearch-platform-and-its-solutions/current/getting-started-guides.html[Starting with the Elastic Platform and Solutions]
* https://www.elastic.co/guide/en/observability/current/observability-get-started.html[Get started with Elastic Observability]
* https://www.elastic.co/guide/en/security/current/ingest-data.html[Ingest data into Elastic Security]
*

60 changes: 60 additions & 0 deletions docs/en/ingest-guide/ingest-intro.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
[discrete]
[[ingest-intro]]
== Ingesting data into {es}

Bring your data!
Whether you call it _adding_, _indexing_, or _ingesting_ data, you have to get
the data into {es} before you can search it, visualize it, and use it for insights.

Our ingest tools are flexible, and support a wide range of scenarios.
We can help you with everything from popular and straightforward use cases, all
the way to advanced use cases that require additional processing in order to modify or
reshape your data before it goes to {es}.

You can ingest:

* **General content** (data without timestamps), such as HTML pages, catalogs, and files
* **Timestamped (time series) data**, such as logs, metrics, and traces for Elastic Security, Observability, Search solutions, or for your own custom solutions

[discrete]
[[ingest-general]]
=== Ingesting general content

Elastic offer tools designed to ingest specific types of general content.
The content type determines the best ingest option.

* To index **documents** directly into {es}, use the {es} link:{ref}/docs.html[document APIs].
* To send **application data** directly to {es}, use an link:https://www.elastic.co/guide/en/elasticsearch/client/index.html[{es}
language client].
* To index **web page content**, use the Elastic link:https://www.elastic.co/web-crawler[web crawler].
* To sync **data from third-party sources**, use link:{ref}/es-connectors.html[connectors].
A connector syncs content from an original data source to an {es} index.
Using connectors you can create _searchable_, read-only replicas of your data sources.
* To index **single files** for testing in a non-production environment, use the {kib} link:{kibana-ref}/connect-to-elasticsearch.html#upload-data-kibana[file uploader].

If you would like to try things out before you add your own data, try using our {kibana-ref}/connect-to-elasticsearch.html#_add_sample_data[sample data].

[discrete]
[[ingest-timestamped]]
=== Ingesting time-stamped data

[ingest-best-timestamped]
.What's the best approach for ingesting time-stamped data?
****
The best approach for ingesting data is the _simplest option_ that _meets your needs_ and _satisfies your use case_.

In most cases, the _simplest option_ for ingesting timestamped data is using {agent} paired with an Elastic integration.

* Install {fleet-guide}[Elastic Agent] on the computer(s) from which you want to collect data.
* Add the {integrations-docs}[Elastic integration] for the data source to your deployment.

Integrations are available for many popular platforms and services, and are a
good place to start for ingesting data into Elastic solutions--Observability,
Security, and Search--or your own search application.

Check out the {integrations-docs}/all_integrations[Integration quick reference]
to search for available integrations.
If you don't find an integration for your data source or if you need
additional processing to extend the integration, we still have you covered.
Check out <<ingest-addl-proc,additional processing>> for a sneak peek.
****
110 changes: 110 additions & 0 deletions docs/en/ingest-guide/ingest-solutions.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
[[ingest-for-solutions]]
== Ingesting data for Elastic solutions

Elastic solutions--Security, Observability, and Search--are loaded with features
and functionality to help you get value and insights from your data.
{fleet-guide}[Elastic Agent] and {integrations-docs}[Elastic integrations] can help, and are the best place to start.

When you use integrations with solutions, you have an integrated experience that offers
easier implementation and decreases the time it takes to get insights and value from your data.

[ingest-process-overview]
.High-level overview
****
To use {fleet-guide}[Elastic Agent] and {integrations-docs}[Elastic integrations]
with Elastic solutions:

1. Create an link:https://www.elastic.co/cloud[{ecloud}] deployment for your solution.
If you don't have an {ecloud} account, you can sign up for a link:https://cloud.elastic.co/registration[free trial] to get started.
2. Add the {integrations-docs}[Elastic integration] for your data source to the deployment.
3. link:{fleet-guide}/elastic-agent-installation.html[Install {agent}] on the systems whose data you want to collect.
****

NOTE: {serverless-docs}[Elastic serverless] makes using solutions even easier.
Sign up for a link:{serverless-docs}/general/sign-up-trial[free trial], and check it out.


[discrete]
[[ingest-for-search]]
=== Ingesting data for Search

{es} is the magic behind Search and our other solutions.
The solution gives you more pre-built components to get you up and running quickly for common use cases.

**Resources**

* link:{fleet-guide}/elastic-agent-installation.html[Install {agent}]
* link:https://www.elastic.co/integrations/data-integrations?solution=search[Elastic Search for integrations]
* link:{ref}[{es} Guide]
** link:{ref}/docs.html[{es} document APIs]
** link:https://www.elastic.co/guide/en/elasticsearch/client/index.html[{es} language clients]
** link:https://www.elastic.co/web-crawler[Elastic web crawler]
** link:{ref}/es-connectors.html[Elastic connectors]


[discrete]
[[ingest-for-obs]]
=== Ingesting data for Observability

With link:https://www.elastic.co/observability[Elastic Observability], you can
monitor and gain insights into logs, metrics, and application traces.
The guides and resources in this section illustrate how to ingest data and use
it with the Observability solution.


**Guides for popular Observability use cases**

* link:{estc-welcome}/getting-started-observability.html[Monitor applications and systems with Elastic Observability]
* link:https://www.elastic.co/guide/en/observability/current/logs-metrics-get-started.html[Get started with logs and metrics]
** link:https://www.elastic.co/guide/en/observability/current/logs-metrics-get-started.html#add-system-integration[Step 1: Add the {agent} System integration]
** link:https://www.elastic.co/guide/en/observability/current/logs-metrics-get-started.html#add-agent-to-fleet[Step 2: Install and run {agent}]

* link:{serverless-docs}/observability/what-is-observability-serverless[Observability] on link:{serverless-docs}[{serverless-full}]:
** link:{serverless-docs}/observability/quickstarts/monitor-hosts-with-elastic-agent[Monitor hosts with {agent} ({serverless-short})]
** link:{serverless-docs}/observability/quickstarts/k8s-logs-metrics[Monitor your K8s cluster with {agent} ({serverless-short})]

**Resources**

* link:{fleet-guide}/elastic-agent-installation.html[Install {agent}]
* link:https://www.elastic.co/integrations/data-integrations?solution=observability[Elastic Observability integrations]

[discrete]
[[ingest-for-security]]
=== Ingesting data for Security

You can detect and respond to threats when you use
link:https://www.elastic.co/security[Elastic Security] to analyze and take
action on your data.
The guides and resources in this section illustrate how to ingest data and use it with the Security solution.

**Guides for popular Security use cases**

* link:https://www.elastic.co/guide/en/starting-with-the-elasticsearch-platform-and-its-solutions/current/getting-started-siem-security.html[Use Elastic Security for SIEM]
* link:https://www.elastic.co/guide/en/starting-with-the-elasticsearch-platform-and-its-solutions/current/getting-started-endpoint-security.html[Protect hosts with endpoint threat intelligence from Elastic Security]

**Resources**

* link:{fleet-guide}/elastic-agent-installation.html[Install {agent}]
* link:https://www.elastic.co/integrations/data-integrations?solution=search[Elastic Security integrations]
* link:{security-guide}/es-overview.html[Elastic Security documentation]


[discrete]
[[ingest-for-custom]]
=== Ingesting data for your own custom search solution

Elastic solutions can give you a head start for common use cases, but you are not at all limited.
You can still do your own thing with a custom solution designed by _you_.

Bring your ideas and use {es} and the {stack} to store, search, and visualize your data.

**Resources**

* link:{fleet-guide}/elastic-agent-installation.html[Install {agent}]
* link:{ref}[{es} Guide]
** link:{ref}/docs.html[{es} document APIs]
** link:https://www.elastic.co/guide/en/elasticsearch/client/index.html[{es} language clients]
** link:https://www.elastic.co/web-crawler[Elastic web crawler]
** link:{ref}/es-connectors.html[Elastic connectors]
* link:{estc-welcome}/getting-started-general-purpose.html[Tutorial: Get started with vector search and generative AI]

39 changes: 39 additions & 0 deletions docs/en/ingest-guide/ingest-static.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
[[intro-general]]
== Ingesting general content

Describe general content (non-timestamped)and give examples.

.WORK IN PROGRESS
****
Progressive disclosure: Start with basic use cases and work up to advanced processing

Possibly repurpose and use ingest decision tree with Beats removed?
****

[discrete]
=== Basic use cases

* {es} document APIs for documents.
* Elastic language clients for application data.
* Elastic web crawler for web page content.
* Connectors for data from third-party sources, such as Slack, etc.
* Kibana file uploader for individual files.
* LOGSTASH???
** ToDO: Check out Logstash enterprisesearch-integration

* To index **documents** directly into {es}, use the {es} document APIs.
* To send **application data** directly to {es}, use an Elastic language client.
* To index **web page content**, use the Elastic web crawler.
* To sync **data from third-party sources**, use connectors.
* To index **single files** for testing, use the Kibana file uploader.

[discrete]
=== Advanced use cases: Data enrichment and transformation

Tools for enriching ingested data:

- Logstash - GEOIP enrichment. Other examples?
** Use enterprisesearch input -> Filter(s) -> ES or enterprisesearch output
- What else?


Loading