Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dev] Add High Cardinality Indexer to Kibana as kbn-data-forge #174559

Merged
merged 5 commits into from
Jan 23, 2024

Conversation

simianhacker
Copy link
Member

@simianhacker simianhacker commented Jan 9, 2024

Summary

This PR adds the High Cardinality Indexer to Kibana as a new package called kbn-data-forge. It also replaces kbn-infra-forge usage in the test and is the preferred way to generate data for Observability use cases, specifically for SLO testing.

Todo

  • Replace kbn-infra-forge usage
  • Create convenience functions for testing (generate and cleanup)
  • Make the logger (LoggingTool) configurable as an injected dependency
  • Make the Elasticsearch client (Client) configurable as an injected dependency
  • Fix the ECS Generate commands
  • Add CLI options via Commander

CLI Help Screen

Usage: data_forge.js [options]

A data generation tool that will create realistic data with different scenarios.

Options:
  --config <filepath>                  The YAML config file
  --lookback <datemath>                When to start the indexing (default: "now-15m")
  --events-per-cycle <number>          The number of events per cycle (default: 1)
  --payload-size <number>              The size of the ES bulk payload (default: 10000)
  --concurrency <number>               The number of concurrent connections to Elasticsearch (default: 5)
  --index-interval <milliseconds>      The interval of the data in milliseconds (default: 60000)
  --dataset <dataset>                  The name of the dataset to use. Valid options: "fake_logs", "fake_hosts", "fake_stack" (default: "fake_logs")
  --scenario <scenerio>                The scenario to label the events with (default: "good")
  --elasticsearch-host <address>       The address to the Elasticsearch cluster (default: "http://localhost:9200")
  --elasticsearch-username <username>  The username to for the Elasticsearch cluster (default: "elastic")
  --elasticsearch-password <password>  The password for the Elasticsearch cluster (default: "changeme")
  --elasticsearch-api-key <key>        The API key to connect to the Elasticsearch cluster
  --kibana-url <address>               The address to the Kibana server (default: "http://localhost:5601")
  --kibana-username <username>         The username for the Kibana server (default: "elastic")
  --kibana-password <password>         The password for the Kibana server (default: "changeme")
  --install-kibana-assets              This will install index patterns, visualizations, and dashboards for the dataset
  --event-template <template>          The name of the event template (default: "good")
  --reduce-weekend-traffic-by <ratio>  This will reduce the traffic on the weekends by the specified amount. Example: 0.5 will reduce the traffic by half (default: 0)
  --ephemeral-project-ids <number>     The number of ephemeral projects to create. This is only enabled for the "fake_stack" dataset. It will create project IDs that will last 5 to 12 hours. (default: 0)
  -h, --help                           output usage information

Testing an Example

Run the following command against a clean Kibana development enviroment:

node x-pack/scripts/data_forge.js --events-per-cycle 200 --lookback now-1h --install-kibana-assets --ephemeral-project-ids 10 --dataset fake_stack

This should install a handful of DataViews (Admin Console, Message Processor, Nginx Logs, Mongodb Logs) along with a few dashboards and visualizations.

@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@simianhacker
Copy link
Member Author

/ci

@simianhacker simianhacker marked this pull request as ready for review January 16, 2024 13:57
@simianhacker simianhacker requested review from a team as code owners January 16, 2024 13:57
@simianhacker simianhacker added release_note:skip Skip the PR/issue when compiling release notes v8.13.0 Team:obs-ux-management Observability Management User Experience Team labels Jan 16, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@simianhacker simianhacker requested review from a team as code owners January 22, 2024 21:33
@botelastic botelastic bot added the Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team label Jan 22, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

- Remove default logger to allow user to override
- removing HCI index prefixes; adding cleanup; adding generate; overriding client; overriding logger;
- Updating test to replace infra-forge with data-forge
- Adding codeowners for kbn-data-forge
- Fixing the paths for the ECS generate command
- Implimenting cli options
- Fixing spelling errors
- Running yarn kbn bootstrap
- Removing depricated faker.random.numeric
- Shaping the data correctly for the tests
- Fixing config for each test
- Fixing jest.config.js
- second attempt at fixing jest.config.js
- Attempting to fix the document count test
- Attempting to fix the document count test
- Fixing types
- Removing depreciated installTemplate function and coresponding templates
- Fixing tests to be more robust so they don't execute until the source documents are available.
- Fixing typo
- Adding changes to burn rate rule
- Fixing document checks
@simianhacker simianhacker removed request for a team and pzl January 22, 2024 21:53
@simianhacker
Copy link
Member Author

Sorry... I had to force push. Somehow, the last merge commit changes a bunch of unrelated files which in turn triggered a bunch of reviews that were unnecessary.

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #58 / machine learning - short tests model management trained models for ML power user with imported models deletes the imported model pt_tiny_pass_through

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/data-forge - 26 +26
Unknown metric groups

API count

id before after diff
@kbn/data-forge - 26 +26

ESLint disabled line counts

id before after diff
@kbn/data-forge - 1 +1

Total ESLint disabled count

id before after diff
@kbn/data-forge - 1 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@kdelemme
Copy link
Contributor

Tested locally and work as expected!

Copy link
Contributor

@kdelemme kdelemme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Tested locally and reviewed the test changes 👍🏻

Comment on lines +38 to +54
dataForgeConfig = {
schedule: [
{
template: 'good',
start: 'now-15m',
end: 'now+5m',
metrics: [
{ name: 'system.cpu.user.pct', method: 'linear', start: 2.5, end: 2.5 },
{ name: 'system.cpu.total.pct', method: 'linear', start: 0.5, end: 0.5 },
{ name: 'system.cpu.total.norm.pct', method: 'linear', start: 0.8, end: 0.8 },
],
},
],
indexing: { dataset: 'fake_hosts' as Dataset, eventsPerCycle: 1, interval: 10000 },
};
dataForgeIndices = await generate({ client: esClient, config: dataForgeConfig, logger });
await alertingApi.waitForDocumentInIndex({ indexName: DATA_VIEW, docCountTarget: 360 });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

@simianhacker simianhacker merged commit 5f72e78 into elastic:main Jan 23, 2024
35 checks passed
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jan 23, 2024
CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this pull request Feb 15, 2024
…ic#174559)

## Summary

This PR adds the [High Cardinality
Indexer](https://github.com/elastic/high-cardinality-cluster) to Kibana
as a new package called `kbn-data-forge`. It also replaces
`kbn-infra-forge` usage in the test and is the preferred way to generate
data for Observability use cases, specifically for SLO testing.

### Todo
- [x] Replace `kbn-infra-forge` usage
- [x] Create convenience functions for testing (`generate` and
`cleanup`)
- [x] Make the logger (`LoggingTool`) configurable as an injected
dependency
- [x] Make the Elasticsearch client (`Client`) configurable as an
injected dependency
- [x] Fix the ECS Generate commands
- [x] Add CLI options via Commander

### CLI Help Screen
```
Usage: data_forge.js [options]

A data generation tool that will create realistic data with different scenarios.

Options:
  --config <filepath>                  The YAML config file
  --lookback <datemath>                When to start the indexing (default: "now-15m")
  --events-per-cycle <number>          The number of events per cycle (default: 1)
  --payload-size <number>              The size of the ES bulk payload (default: 10000)
  --concurrency <number>               The number of concurrent connections to Elasticsearch (default: 5)
  --index-interval <milliseconds>      The interval of the data in milliseconds (default: 60000)
  --dataset <dataset>                  The name of the dataset to use. Valid options: "fake_logs", "fake_hosts", "fake_stack" (default: "fake_logs")
  --scenario <scenerio>                The scenario to label the events with (default: "good")
  --elasticsearch-host <address>       The address to the Elasticsearch cluster (default: "http://localhost:9200")
  --elasticsearch-username <username>  The username to for the Elasticsearch cluster (default: "elastic")
  --elasticsearch-password <password>  The password for the Elasticsearch cluster (default: "changeme")
  --elasticsearch-api-key <key>        The API key to connect to the Elasticsearch cluster
  --kibana-url <address>               The address to the Kibana server (default: "http://localhost:5601")
  --kibana-username <username>         The username for the Kibana server (default: "elastic")
  --kibana-password <password>         The password for the Kibana server (default: "changeme")
  --install-kibana-assets              This will install index patterns, visualizations, and dashboards for the dataset
  --event-template <template>          The name of the event template (default: "good")
  --reduce-weekend-traffic-by <ratio>  This will reduce the traffic on the weekends by the specified amount. Example: 0.5 will reduce the traffic by half (default: 0)
  --ephemeral-project-ids <number>     The number of ephemeral projects to create. This is only enabled for the "fake_stack" dataset. It will create project IDs that will last 5 to 12 hours. (default: 0)
  -h, --help                           output usage information
```

### Testing an Example
Run the following command against a clean Kibana development enviroment:
```
node x-pack/scripts/data_forge.js --events-per-cycle 200 --lookback now-1h --install-kibana-assets --ephemeral-project-ids 10 --dataset fake_stack
```
This should install a handful of DataViews (Admin Console, Message
Processor, Nginx Logs, Mongodb Logs) along with a few dashboards and
visualizations.

---------

Co-authored-by: kibanamachine <[email protected]>
@simianhacker simianhacker deleted the kbn-data-forge branch April 17, 2024 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team Team:obs-ux-management Observability Management User Experience Team v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants