Create large-logs-dataset challenge #634
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
-- Cloned from @salvatore-campagna 's #632
Introduce a new large-logs-dataset challenge to elastic/logs track which duplicates data indexed by restoring
a snapshot multiple times. The number of snapshot restore operations is controlled by the variable snapshot_restore_counts which by default has a value of 100.
This would result in indexing raw_data_volume_per_day bytes multiplied by snapshot_restore_counts.
As an example if raw_data_volume_per_day is 50 GB then the index will have about 5 TB of raw data.
Note that the index, anyway, will include duplicated data.
This is meant to be used just as a fast way to increase the amount of data in an index skipping the expensive data
generation and indexing process.
Resolves #631