Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] [META] Variance Analysis for Performance Runs #398

Closed
6 tasks done
IanHoang opened this issue Oct 19, 2023 · 32 comments
Closed
6 tasks done

[RFC] [META] Variance Analysis for Performance Runs #398

IanHoang opened this issue Oct 19, 2023 · 32 comments
Assignees
Labels
enhancement New feature or request High Priority RFC Request for comment on major changes

Comments

@IanHoang
Copy link
Collaborator

IanHoang commented Oct 19, 2023

Synopsis and Motivation

OpenSearch Benchmark is currently running nightly runs with various configurations. In a few cases, some queries have been detected to have more variance than others. For example, a nightly run on a multi node cluster with the http_logs workload has shown variance for sort queries, as depicted below.
Screenshot 2023-10-31 at 1 39 14 PM

Variances in test runs prevent the community from accurately identifying regressions. The goal of this RFC is to find optimal OpenSearch configurations and rule out any causes of variances, which will in turn improve the effort of obtaining consistent and reproducible numbers.

Questions

We should aim to answer the following:

  • Are there specific query types that are seeing variance? If so, what's the measured variance? How often do these variances occur?
  • What are the causes for these variances? Are these variances related to the ingestion method, core OpenSearch, Lucene, or something else entirely?
  • Which OpenSearch configuration shows minimal variance? This will be useful to know and switch to eventually in the nightly runs.

Strategy

Instead of using the data from nightly runs (or at benchmarks.opensearch.org), we should replicate the nightly runs setup but on a smaller scale. Our setup will include the following characteristics:

Workload: http_logs
OpenSearch versions: 2.9 (or any other stable release should be sufficient)
Test Clusters configurations (subject to change throughout the process):

  • Multi Node cluster (one for x86 and ARM)
  • Single Node cluster (one for x86 and ARM)
  • Try with different shard configurations

The tests will be run on a nightly basis and will replace ingestion phase with restoring from snapshot. This should have a couple benefits. First, it should reduce the overall testing time. Second, it could potentially reduce the query variance since it's suspected that the ingestion phase can be inconsistent and influence query performance. Metrics and test results will be channeled into a external metrics datastore, which will be used to build visualizations for each query. These visualizations will give us a better idea of which queries have variance and will spawn deeper investigations.

Next Steps

Preliminary Testing

  1. Setup a multi-node cluster and an external datastore
  2. For the first series of tests, do traditional OSB runs (ingest then search).
  3. Afterwards, perform a rolling cluster restart on the data nodes.
  4. For second series of tests, run with search operations only.
  5. Build visualizations for all sort queries to see how results vary between runs

Further Testing

  1. Setup two single node clusters with the same configuration. Use the opensearch-cluster-cdk repository.
  2. Each cluster should perform a daily test. One cluster should run a modified test procedure with ingestion and sort queries only. Another cluster should run a modified test procedure with restore from snapshot and sort queries only.
  • Each test should be run with --telemetry=node-stats to gather CPU and JVM metrics. A profiler should also be enabled on each cluster so that we can understand where most of the time is spent during each search operation in each cluster.
  1. Build a visualization that compares the performance of daily runs from each cluster
  2. Characterize the regressions (determine if it is in the client side OSB side, server side, dig deeper based on evidence we have found, etc.)
  • If it is on the server side, we can dig into CPU, JVM, heap profiles.

Some Additional Areas to Check

  • Confirm that OSB's results are properly calculated. We should confirm that OSB does not include warmup data in the final results presented to the user. We should also confirm whether or not latencies values match aggregated metric documents from benchmark-metrics indices.

Current Experiments

  • Set up nightly traditional runs and snapshots runs and get preliminary results
  • Gather data and Identify which queries have the most variance in http_logs
  • Add snapshots to end of traditional runs and increase occurrence of both to twice daily
  • Update visualizations to reflect two runs a day
  • Identify which traditional run had the most variance after a week
  • Identify resource utilizations (CPU and JVM) with profiling and analysis over granular node stats documents

How Can You Help?

  • Any general comments about the overall direction are welcome.
  • Indicating whether the areas identified above for workload enhancement include your scenarios and use-cases will be helpful in prioritizing them.
  • Provide early feedback by testing the new workload features as they become available.
  • Help out on the implementation! Check out the issues page for work that is ready to be picked up.
@IanHoang IanHoang added the enhancement New feature or request label Oct 19, 2023
@IanHoang IanHoang self-assigned this Oct 19, 2023
@gkamat gkamat moved this from Backlog to In Progress in OpenSearch Engineering Effectiveness Oct 24, 2023
@gkamat gkamat moved this from Todo to Now (This Quarter) in Performance Roadmap Oct 30, 2023
@IanHoang IanHoang changed the title Variance Analysis for Performance Runs [RFC] Variance Analysis for Performance Runs Nov 3, 2023
@IanHoang IanHoang added the RFC Request for comment on major changes label Nov 3, 2023
@IanHoang
Copy link
Collaborator Author

IanHoang commented Nov 27, 2023

Last week, I've setup two EC2 instances, one that triggers a test with traditional ingestion and subsequent search and another that runs snapshot restoration and subsequent search. Both tests target clusters of the same configuration (OpenSearch 2.9 single node clusters) and ingest their metrics and results into the same metrics store. Based on the last 5 days that the setup has run, runs that use snapshot restoration are showing less variance compared to runs doing traditional ingestion.

asc_sort_size was the only sorting operation that showed that it's variance was worse than traditional ingestion. Will gather more results in the next week to determine if the variance changes.

Follow ups

  1. Validate accuracy of data: Since this data is based off of benchmark-results documents that are published to the datastore, we should aim to reproduce it with documents from benchmark-metrics documents. Benchmark-results documents contain aggregations of metrics from benchmark-metrics documents.
  2. Determine the resource differences (CPU and JVM) from the two types of runs: Since tests are being run with node-stats telemetry device, the datastore contains documents related to both test clusters' CPU and JVM usage. We should do an analysis on these to see if we can spot any patterns.
  3. Run profilers during the tests: Profilers will help us understand where the most time is spent for each operation.

image
image

@IanHoang
Copy link
Collaborator Author

IanHoang commented Nov 30, 2023

Updated the values here: There were some discrepancies based on the OpenSearch Dashboards visualizations I've made. I've corrected the values and the following is the most recent data. Despite these corrections, we're still seeing a trend that with snapshots, query performance is more stable and has less variance when using snapshots over deleting and reingesting data.

The red indicates that the traditional runs for three queries have much more variance compared to their snapshot runs.
image

@msfroh
Copy link

msfroh commented Dec 1, 2023

@IanHoang -- I was talking with @rishabh6788 a few weeks ago and speculating that segment topology (i.e. how big are the different segments in each shard) could have some impact.

If we have snapshots that perform well and other snapshots that perform poorly, could we restore those, dump the output of /_cat/segments, and save that data somewhere?

In particular, I think we can learn form the "good" segment topologies and tune merge settings to make good performance the default. (See opensearch-project/OpenSearch#11163 for details.)

@IanHoang IanHoang changed the title [RFC] Variance Analysis for Performance Runs [RFC] [META] Variance Analysis for Performance Runs Dec 7, 2023
@IanHoang
Copy link
Collaborator Author

IanHoang commented Dec 18, 2023

Based on @msfroh feedback, I implemented snapshots to be taken after each traditional run. Got the following results across 7 days.
image

To narrow our focus, grabbed the snapshots that performed poorly and well for a single operation. In this case, grabbed snapshots 12/9/2023 9AM CDT and 12/13/2023 9AM CDT (as the first time stamp performed poorly and the second timestamp performed well for operation desc_sort_timestamp) and compared the _cat/segments

12/9/2023 9AM CDT Segments (Worse Performance)

hoangia@3c22fbd0d988 snapshots-segment-comparison % curl "<single-node-cluster-edc9>.elb.us-west-2.amazonaws.com/_cat/segments?v"
index              shard prirep ip         segment generation docs.count docs.deleted    size size.memory committed searchable version compound
logs-221998        0     p      10.0.3.122 _2               2    2143234            0 148.8mb           0 true      true       9.7.0   false
logs-221998        1     p      10.0.3.122 _2               2    2143479            0 148.8mb           0 true      true       9.7.0   false
logs-221998        2     p      10.0.3.122 _3               3    2143348            0 148.8mb           0 true      true       9.7.0   false
logs-221998        3     p      10.0.3.122 _4               4    2143364            0 148.9mb           0 true      true       9.7.0   false
logs-221998        4     p      10.0.3.122 _4               4    2143335            0 148.9mb           0 true      true       9.7.0   false
.plugins-ml-config 0     p      10.0.3.122 _0               0          1            0   3.6kb           0 true      true       9.7.0   true
logs-211998        0     p      10.0.3.122 _8               8    3528822            0   242mb           0 true      true       9.7.0   false
logs-211998        1     p      10.0.3.122 _8               8    3528672            0 241.8mb           0 true      true       9.7.0   false
logs-211998        2     p      10.0.3.122 _6               6    3528033            0 241.6mb           0 true      true       9.7.0   false
logs-211998        3     p      10.0.3.122 _6               6    3530465            0 241.6mb           0 true      true       9.7.0   false
logs-211998        4     p      10.0.3.122 _5               5    3531287            0 241.7mb           0 true      true       9.7.0   false
logs-231998        0     p      10.0.3.122 _2               2    2391122            0 165.8mb           0 true      true       9.7.0   false
logs-231998        1     p      10.0.3.122 _4               4    2393474            0   166mb           0 true      true       9.7.0   false
logs-231998        2     p      10.0.3.122 _4               4    2390482            0   166mb           0 true      true       9.7.0   false
logs-231998        3     p      10.0.3.122 _2               2    2392007            0 165.8mb           0 true      true       9.7.0   false
logs-231998        4     p      10.0.3.122 _4               4    2394257            0 166.3mb           0 true      true       9.7.0   false
logs-241998        0     p      10.0.3.122 _1s             64   36289737            0   2.4gb           0 true      true       9.7.0   false
logs-241998        1     p      10.0.3.122 _1n             59   36283634            0   2.4gb           0 true      true       9.7.0   false
logs-241998        2     p      10.0.3.122 _1n             59   36298180            0   2.4gb           0 true      true       9.7.0   false
logs-241998        3     p      10.0.3.122 _1p             61   36296290            0   2.4gb           0 true      true       9.7.0   false
logs-241998        4     p      10.0.3.122 _1n             59   36295783            0   2.4gb           0 true      true       9.7.0   false
logs-181998        0     p      10.0.3.122 _e              14     541643            0  37.7mb           0 true      true       9.7.0   false
logs-181998        1     p      10.0.3.122 _e              14     541382            0  37.6mb           0 true      true       9.7.0   false
logs-181998        2     p      10.0.3.122 _a              10     542577            0  37.7mb           0 true      true       9.7.0   false
logs-181998        3     p      10.0.3.122 _g              16     541278            0  37.7mb           0 true      true       9.7.0   false
logs-181998        4     p      10.0.3.122 _f              15     541866            0  37.7mb           0 true      true       9.7.0   false
logs-201998        0     p      10.0.3.122 _2               2    2611970            0 178.3mb           0 true      true       9.7.0   false
logs-201998        1     p      10.0.3.122 _2               2    2611361            0 178.3mb           0 true      true       9.7.0   false
logs-201998        2     p      10.0.3.122 _4               4    2610607            0 178.4mb           0 true      true       9.7.0   false
logs-201998        3     p      10.0.3.122 _2               2    2608666            0   178mb           0 true      true       9.7.0   false
logs-201998        4     p      10.0.3.122 _4               4    2610859            0 178.6mb           0 true      true       9.7.0   false
logs-191998        0     p      10.0.3.122 _2               2    1939464            0 134.9mb           0 true      true       9.7.0   false
logs-191998        1     p      10.0.3.122 _2               2    1940408            0 134.9mb           0 true      true       9.7.0   false
logs-191998        2     p      10.0.3.122 _2               2    1939515            0 134.7mb           0 true      true       9.7.0   false
logs-191998        3     p      10.0.3.122 _2               2    1936586            0 134.5mb           0 true      true       9.7.0   false
logs-191998        4     p      10.0.3.122 _2               2    1941909            0 134.9mb           0 true      true       9.7.0   false

12/13/2023 9AM CDT Segments (Better Performance)

hoangia@3c22fbd0d988 snapshots-segment-comparison % curl "<single-node-cluster-56f1>.elb.us-west-2.amazonaws.com/_cat/segments?v"
index              shard prirep ip         segment generation docs.count docs.deleted    size size.memory committed searchable version compound
logs-221998        0     p      10.0.3.148 _3               3    2144238            0 149.6mb           0 true      true       9.7.0   false
logs-221998        1     p      10.0.3.148 _4               4    2142247            0 149.6mb           0 true      true       9.7.0   false
logs-221998        2     p      10.0.3.148 _2               2    2141944            0 149.4mb           0 true      true       9.7.0   false
logs-221998        3     p      10.0.3.148 _4               4    2144492            0 149.6mb           0 true      true       9.7.0   false
logs-221998        4     p      10.0.3.148 _3               3    2143839            0 149.6mb           0 true      true       9.7.0   false
.plugins-ml-config 0     p      10.0.3.148 _0               0          1            0   3.6kb           0 true      true       9.7.0   true
logs-211998        0     p      10.0.3.148 _8               8    3529081            0 242.7mb           0 true      true       9.7.0   false
logs-211998        1     p      10.0.3.148 _8               8    3531223            0 242.8mb           0 true      true       9.7.0   false
logs-211998        2     p      10.0.3.148 _5               5    3530963            0 242.4mb           0 true      true       9.7.0   false
logs-211998        3     p      10.0.3.148 _6               6    3528810            0 242.3mb           0 true      true       9.7.0   false
logs-211998        4     p      10.0.3.148 _6               6    3527202            0 242.3mb           0 true      true       9.7.0   false
logs-231998        0     p      10.0.3.148 _4               4    2392858            0 166.2mb           0 true      true       9.7.0   false
logs-231998        1     p      10.0.3.148 _4               4    2392680            0   166mb           0 true      true       9.7.0   false
logs-231998        2     p      10.0.3.148 _3               3    2391421            0 165.9mb           0 true      true       9.7.0   false
logs-231998        3     p      10.0.3.148 _4               4    2394714            0 166.2mb           0 true      true       9.7.0   false
logs-231998        4     p      10.0.3.148 _3               3    2389669            0 165.7mb           0 true      true       9.7.0   false
logs-241998        0     p      10.0.3.148 _1p             61   36304028            0   2.4gb           0 true      true       9.7.0   false
logs-241998        1     p      10.0.3.148 _1j             55   36289025            0   2.4gb           0 true      true       9.7.0   false
logs-241998        2     p      10.0.3.148 _1k             56   36286110            0   2.4gb           0 true      true       9.7.0   false
logs-241998        3     p      10.0.3.148 _1p             61   36289958            0   2.4gb           0 true      true       9.7.0   false
logs-241998        4     p      10.0.3.148 _21             73   36294503            0   2.4gb           0 true      true       9.7.0   false
logs-181998        0     p      10.0.3.148 _g              16     541135            0  37.7mb           0 true      true       9.7.0   false
logs-181998        1     p      10.0.3.148 _b              11     542261            0  37.7mb           0 true      true       9.7.0   false
logs-181998        2     p      10.0.3.148 _g              16     541455            0  37.7mb           0 true      true       9.7.0   false
logs-181998        3     p      10.0.3.148 _f              15     541309            0  37.6mb           0 true      true       9.7.0   false
logs-181998        4     p      10.0.3.148 _d              13     542586            0  37.7mb           0 true      true       9.7.0   false
logs-201998        0     p      10.0.3.148 _2               2    2611453            0 177.9mb           0 true      true       9.7.0   false
logs-201998        1     p      10.0.3.148 _2               2    2610509            0   178mb           0 true      true       9.7.0   false
logs-201998        2     p      10.0.3.148 _4               4    2611964            0 178.3mb           0 true      true       9.7.0   false
logs-201998        3     p      10.0.3.148 _4               4    2607141            0 178.3mb           0 true      true       9.7.0   false
logs-201998        4     p      10.0.3.148 _2               2    2612396            0 178.1mb           0 true      true       9.7.0   false
logs-191998        0     p      10.0.3.148 _2               2    1939882            0 134.7mb           0 true      true       9.7.0   false
logs-191998        1     p      10.0.3.148 _2               2    1938447            0 134.7mb           0 true      true       9.7.0   false
logs-191998        2     p      10.0.3.148 _2               2    1939460            0 134.8mb           0 true      true       9.7.0   false
logs-191998        3     p      10.0.3.148 _2               2    1942434            0 134.8mb           0 true      true       9.7.0   false
logs-191998        4     p      10.0.3.148 _2               2    1937659            0 134.5mb           0 true      true       9.7.0   false

@getsaurabh02 getsaurabh02 moved this from Now (This Quarter) to In Progress in Performance Roadmap Dec 21, 2023
@IanHoang
Copy link
Collaborator Author

IanHoang commented Jan 2, 2024

Profiled queries with desc-sort-timestamp operation in worst performing snapshot to see where time is mostly being spent. In summary, desc-sort-timestamp reaches 35 shards, and all shards spend around the same time in seconds except for shards in logs-241998. logs-241998 is the largest index in http-logs, approximately 10x bigger than other indices in the cluster. As one can see, the shards in logs-241998 not only differ in time from shards in other indices but also from within the same index.

When looking at _cat/segments?v, we see that all shards in logs-241998 are ~2.4gb.

Next steps:

  • Perform workloads test with logs-241998 only and profile the queries once me to see if the time in nanos is improved. Wondering if the sort could be improved if the indices were better balanced in size.
  • Look into disk or API to get information on how the distribution of segments is like for these shards of logs-241998
hoangia@3c22fbd0d988 scripts % bash query-time-calculator-improved.sh
Number of shards: 35
[5n7IB4H4RySukGObYIgh3Q][logs-181998][0]
Query type and query runtime in seconds:     ConstantScoreQuery     0.006861
[5n7IB4H4RySukGObYIgh3Q][logs-181998][1]
Query type and query runtime in seconds:     ConstantScoreQuery     0.006325
[5n7IB4H4RySukGObYIgh3Q][logs-181998][2]
Query type and query runtime in seconds:     ConstantScoreQuery     0.006678
[5n7IB4H4RySukGObYIgh3Q][logs-181998][3]
Query type and query runtime in seconds:     ConstantScoreQuery     0.004286
[5n7IB4H4RySukGObYIgh3Q][logs-181998][4]
Query type and query runtime in seconds:     ConstantScoreQuery     0.006689
[5n7IB4H4RySukGObYIgh3Q][logs-191998][0]
Query type and query runtime in seconds:     ConstantScoreQuery     0.006502
[5n7IB4H4RySukGObYIgh3Q][logs-191998][1]
Query type and query runtime in seconds:     ConstantScoreQuery     0.00679
[5n7IB4H4RySukGObYIgh3Q][logs-191998][2]
Query type and query runtime in seconds:     ConstantScoreQuery     0.006493
[5n7IB4H4RySukGObYIgh3Q][logs-191998][3]
Query type and query runtime in seconds:     ConstantScoreQuery     0.0063
[5n7IB4H4RySukGObYIgh3Q][logs-191998][4]
Query type and query runtime in seconds:     ConstantScoreQuery     0.00686
[5n7IB4H4RySukGObYIgh3Q][logs-201998][0]
Query type and query runtime in seconds:     ConstantScoreQuery     0.006883
[5n7IB4H4RySukGObYIgh3Q][logs-201998][1]
Query type and query runtime in seconds:     ConstantScoreQuery     0.006494
[5n7IB4H4RySukGObYIgh3Q][logs-201998][2]
Query type and query runtime in seconds:     ConstantScoreQuery     0.006939
[5n7IB4H4RySukGObYIgh3Q][logs-201998][3]
Query type and query runtime in seconds:     ConstantScoreQuery     0.00696
[5n7IB4H4RySukGObYIgh3Q][logs-201998][4]
Query type and query runtime in seconds:     ConstantScoreQuery     0.006937
[5n7IB4H4RySukGObYIgh3Q][logs-211998][0]
Query type and query runtime in seconds:     ConstantScoreQuery     0.006841
[5n7IB4H4RySukGObYIgh3Q][logs-211998][1]
Query type and query runtime in seconds:     ConstantScoreQuery     0.007997
[5n7IB4H4RySukGObYIgh3Q][logs-211998][2]
Query type and query runtime in seconds:     ConstantScoreQuery     0.00701
[5n7IB4H4RySukGObYIgh3Q][logs-211998][3]
Query type and query runtime in seconds:     ConstantScoreQuery     0.006762
[5n7IB4H4RySukGObYIgh3Q][logs-211998][4]
Query type and query runtime in seconds:     ConstantScoreQuery     0.007
[5n7IB4H4RySukGObYIgh3Q][logs-221998][0]
Query type and query runtime in seconds:     ConstantScoreQuery     0.005391
[5n7IB4H4RySukGObYIgh3Q][logs-221998][1]
Query type and query runtime in seconds:     ConstantScoreQuery     0.00736
[5n7IB4H4RySukGObYIgh3Q][logs-221998][2]
Query type and query runtime in seconds:     ConstantScoreQuery     0.006936
[5n7IB4H4RySukGObYIgh3Q][logs-221998][3]
Query type and query runtime in seconds:     ConstantScoreQuery     0.00684
[5n7IB4H4RySukGObYIgh3Q][logs-221998][4]
Query type and query runtime in seconds:     ConstantScoreQuery     0.006832
[5n7IB4H4RySukGObYIgh3Q][logs-231998][0]
Query type and query runtime in seconds:     ConstantScoreQuery     0.007919
[5n7IB4H4RySukGObYIgh3Q][logs-231998][1]
Query type and query runtime in seconds:     ConstantScoreQuery     0.007878
[5n7IB4H4RySukGObYIgh3Q][logs-231998][2]
Query type and query runtime in seconds:     ConstantScoreQuery     0.007949
[5n7IB4H4RySukGObYIgh3Q][logs-231998][3]
Query type and query runtime in seconds:     ConstantScoreQuery     0.007367
[5n7IB4H4RySukGObYIgh3Q][logs-231998][4]
Query type and query runtime in seconds:     ConstantScoreQuery     0.00573
[5n7IB4H4RySukGObYIgh3Q][logs-241998][0]
Query type and query runtime in seconds:     ConstantScoreQuery     0.444825
[5n7IB4H4RySukGObYIgh3Q][logs-241998][1]
Query type and query runtime in seconds:     ConstantScoreQuery     0.010329
[5n7IB4H4RySukGObYIgh3Q][logs-241998][2]
Query type and query runtime in seconds:     ConstantScoreQuery     0.050528
[5n7IB4H4RySukGObYIgh3Q][logs-241998][3]
Query type and query runtime in seconds:     ConstantScoreQuery     0.05679
[5n7IB4H4RySukGObYIgh3Q][logs-241998][4]
Query type and query runtime in seconds:     ConstantScoreQuery     0.632663

@IanHoang
Copy link
Collaborator Author

Held back by 1.2.0 release and other tasks.
Next steps:

  • Modifying workload to run with logs-241998 only
  • Look into disk API or way to get insight about segment distribution in logs-241998 index

@msfroh
Copy link

msfroh commented Jan 18, 2024

Look into disk API or way to get insight about segment distribution in logs-241998 index

The segments are just files on disk on the data nodes. If you can get onto the data nodes, you can see the contents of each shard, like:

% ls -gG data/nodes/0/indices/6nzgE2PhTR22ejuYJ2Zsqw/0/index/
total 264
-rw-r--r-- 1   479 Nov 16 00:48 _0.cfe
-rw-r--r-- 1 17105 Nov 16 00:48 _0.cfs
-rw-r--r-- 1   341 Nov 16 00:48 _0.si
-rw-r--r-- 1   479 Nov 16 00:48 _1.cfe
-rw-r--r-- 1 17105 Nov 16 00:48 _1.cfs
-rw-r--r-- 1   341 Nov 16 00:48 _1.si
-rw-r--r-- 1   479 Nov 16 00:48 _2.cfe
-rw-r--r-- 1 14908 Nov 16 00:48 _2.cfs
-rw-r--r-- 1   341 Nov 16 00:48 _2.si
-rw-r--r-- 1   479 Nov 16 00:48 _3.cfe
-rw-r--r-- 1 17705 Nov 16 00:48 _3.cfs
-rw-r--r-- 1   341 Nov 16 00:48 _3.si
-rw-r--r-- 1   479 Nov 16 00:48 _4.cfe
-rw-r--r-- 1 17705 Nov 16 00:48 _4.cfs
-rw-r--r-- 1   341 Nov 16 00:48 _4.si
-rw-r--r-- 1   479 Nov 16 00:48 _5.cfe
-rw-r--r-- 1 12104 Nov 16 00:48 _5.cfs
-rw-r--r-- 1   341 Nov 16 00:48 _5.si
-rw-r--r-- 1   479 Nov 16 00:57 _6.cfe
-rw-r--r-- 1 15215 Nov 16 00:57 _6.cfs
-rw-r--r-- 1   341 Nov 16 00:57 _6.si
-rw-r--r-- 1   479 Nov 16 00:57 _7.cfe
-rw-r--r-- 1 14908 Nov 16 00:57 _7.cfs
-rw-r--r-- 1   341 Nov 16 00:57 _7.si
-rw-r--r-- 1   479 Nov 16 00:57 _8.cfe
-rw-r--r-- 1 17705 Nov 16 00:57 _8.cfs
-rw-r--r-- 1   341 Nov 16 00:57 _8.si
-rw-r--r-- 1   479 Nov 16 00:57 _9.cfe
-rw-r--r-- 1 17105 Nov 16 00:57 _9.cfs
-rw-r--r-- 1   341 Nov 16 00:57 _9.si
-rw-r--r-- 1  1051 Nov 16 00:58 segments_4
-rw-r--r-- 1     0 Nov 16 00:47 write.lock

That's a very small shard with 10 segments, where most segments are about 17 kilobytes (but segments _5, _6, and _7 are a little smaller).

@IanHoang
Copy link
Collaborator Author

Look into disk API or way to get insight about segment distribution in logs-241998 index

The segments are just files on disk on the data nodes. If you can get onto the data nodes, you can see the contents of each shard, like:

That's a very small shard with 10 segments, where most segments are about 17 kilobytes (but segments _5, _6, and _7 are a little smaller).

Thanks @msfroh! This is helpful. I currently have a test running with logs-241998 only but will ssh into the data node and see what shows up. Will also check what shows up for the original snapshots we were inspecting.

@IanHoang
Copy link
Collaborator Author

Segment Distribution for Shards in Logs-241998 Index

# Shard 0
[ec2-user@ip-10-0-3-122 opensearch]$ ls -gG data/nodes/0/indices/DyePfDN6QheQBx2u-V8nAA/0/index/
total 2612972
-rw-rw-r-- 1       2932 Jan 21 17:34 _1o.fdm
-rw-rw-r-- 1 1381567758 Jan 21 17:31 _1o.fdt
-rw-rw-r-- 1     214544 Jan 21 17:30 _1o.fdx
-rw-rw-r-- 1       1150 Jan 21 17:33 _1o.fnm
-rw-rw-r-- 1  349984226 Jan 21 17:33 _1o.kdd
-rw-rw-r-- 1    1563296 Jan 21 17:34 _1o.kdi
-rw-rw-r-- 1        421 Jan 21 17:30 _1o.kdm
-rw-rw-r-- 1   36300725 Jan 21 17:31 _1o.nvd
-rw-rw-r-- 1        103 Jan 21 17:32 _1o.nvm
-rw-rw-r-- 1        594 Jan 21 17:30 _1o.si
-rw-rw-r-- 1  168909454 Jan 21 17:33 _1o_Lucene90_0.doc
-rw-rw-r-- 1  456496908 Jan 21 17:34 _1o_Lucene90_0.dvd
-rw-rw-r-- 1       1025 Jan 21 17:33 _1o_Lucene90_0.dvm
-rw-rw-r-- 1   44450590 Jan 21 17:34 _1o_Lucene90_0.pos
-rw-rw-r-- 1  220254259 Jan 21 17:33 _1o_Lucene90_0.tim
-rw-rw-r-- 1   15884746 Jan 21 17:32 _1o_Lucene90_0.tip
-rw-rw-r-- 1        464 Jan 21 17:30 _1o_Lucene90_0.tmd
-rw-rw-r-- 1        375 Jan 21 17:34 segments_k
-rw-rw-r-- 1          0 Jan 21 17:30 write.lock
# Shard 1
[ec2-user@ip-10-0-3-122 opensearch]$ ls -gG data/nodes/0/indices/DyePfDN6QheQBx2u-V8nAA/1/index/
total 2608896
-rw-rw-r-- 1       2932 Jan 21 17:33 _1n.fdm
-rw-rw-r-- 1 1380771931 Jan 21 17:34 _1n.fdt
-rw-rw-r-- 1     212435 Jan 21 17:34 _1n.fdx
-rw-rw-r-- 1       1150 Jan 21 17:30 _1n.fnm
-rw-rw-r-- 1  347042018 Jan 21 17:32 _1n.kdd
-rw-rw-r-- 1    1562274 Jan 21 17:33 _1n.kdi
-rw-rw-r-- 1        421 Jan 21 17:33 _1n.kdm
-rw-rw-r-- 1   36291680 Jan 21 17:33 _1n.nvd
-rw-rw-r-- 1        103 Jan 21 17:30 _1n.nvm
-rw-rw-r-- 1        594 Jan 21 17:31 _1n.si
-rw-rw-r-- 1  168850216 Jan 21 17:33 _1n_Lucene90_0.doc
-rw-rw-r-- 1  456269379 Jan 21 17:32 _1n_Lucene90_0.dvd
-rw-rw-r-- 1       1025 Jan 21 17:30 _1n_Lucene90_0.dvm
-rw-rw-r-- 1   44435777 Jan 21 17:30 _1n_Lucene90_0.pos
-rw-rw-r-- 1  220168917 Jan 21 17:33 _1n_Lucene90_0.tim
-rw-rw-r-- 1   15849306 Jan 21 17:30 _1n_Lucene90_0.tip
-rw-rw-r-- 1        461 Jan 21 17:33 _1n_Lucene90_0.tmd
-rw-rw-r-- 1        375 Jan 21 17:34 segments_j
-rw-rw-r-- 1          0 Jan 21 17:30 write.lock
# Shard 2
[ec2-user@ip-10-0-3-122 opensearch]$ ls -gG data/nodes/0/indices/DyePfDN6QheQBx2u-V8nAA/2/index/
total 2612484
-rw-rw-r-- 1       2932 Jan 21 17:34 _1k.fdm
-rw-rw-r-- 1 1381207277 Jan 21 17:31 _1k.fdt
-rw-rw-r-- 1     213080 Jan 21 17:30 _1k.fdx
-rw-rw-r-- 1       1150 Jan 21 17:33 _1k.fnm
-rw-rw-r-- 1  350012969 Jan 21 17:33 _1k.kdd
-rw-rw-r-- 1    1563259 Jan 21 17:34 _1k.kdi
-rw-rw-r-- 1        421 Jan 21 17:34 _1k.kdm
-rw-rw-r-- 1   36291347 Jan 21 17:30 _1k.nvd
-rw-rw-r-- 1        103 Jan 21 17:33 _1k.nvm
-rw-rw-r-- 1        594 Jan 21 17:32 _1k.si
-rw-rw-r-- 1  168857123 Jan 21 17:34 _1k_Lucene90_0.doc
-rw-rw-r-- 1  456360313 Jan 21 17:32 _1k_Lucene90_0.dvd
-rw-rw-r-- 1       1025 Jan 21 17:33 _1k_Lucene90_0.dvm
-rw-rw-r-- 1   44432482 Jan 21 17:33 _1k_Lucene90_0.pos
-rw-rw-r-- 1  220341085 Jan 21 17:35 _1k_Lucene90_0.tim
-rw-rw-r-- 1   15848967 Jan 21 17:33 _1k_Lucene90_0.tip
-rw-rw-r-- 1        470 Jan 21 17:31 _1k_Lucene90_0.tmd
-rw-rw-r-- 1        375 Jan 21 17:35 segments_j
-rw-rw-r-- 1          0 Jan 21 17:30 write.lock
# Shard 3
[ec2-user@ip-10-0-3-122 opensearch]$ ls -gG data/nodes/0/indices/DyePfDN6QheQBx2u-V8nAA/3/index/
total 2608960
-rw-rw-r-- 1       2932 Jan 21 17:34 _1o.fdm
-rw-rw-r-- 1 1380604604 Jan 21 17:32 _1o.fdt
-rw-rw-r-- 1     211670 Jan 21 17:33 _1o.fdx
-rw-rw-r-- 1       1150 Jan 21 17:33 _1o.fnm
-rw-rw-r-- 1  347707805 Jan 21 17:33 _1o.kdd
-rw-rw-r-- 1    1562010 Jan 21 17:35 _1o.kdi
-rw-rw-r-- 1        421 Jan 21 17:30 _1o.kdm
-rw-rw-r-- 1   36280946 Jan 21 17:33 _1o.nvd
-rw-rw-r-- 1        103 Jan 21 17:33 _1o.nvm
-rw-rw-r-- 1        594 Jan 21 17:32 _1o.si
-rw-rw-r-- 1  168800084 Jan 21 17:34 _1o_Lucene90_0.doc
-rw-rw-r-- 1  456173410 Jan 21 17:35 _1o_Lucene90_0.dvd
-rw-rw-r-- 1       1025 Jan 21 17:33 _1o_Lucene90_0.dvm
-rw-rw-r-- 1   44408746 Jan 21 17:35 _1o_Lucene90_0.pos
-rw-rw-r-- 1  219942579 Jan 21 17:34 _1o_Lucene90_0.tim
-rw-rw-r-- 1   15837944 Jan 21 17:33 _1o_Lucene90_0.tip
-rw-rw-r-- 1        480 Jan 21 17:30 _1o_Lucene90_0.tmd
-rw-rw-r-- 1        375 Jan 21 17:35 segments_k
-rw-rw-r-- 1          0 Jan 21 17:30 write.lock
# Shard 4
[ec2-user@ip-10-0-3-122 opensearch]$ ls -gG data/nodes/0/indices/DyePfDN6QheQBx2u-V8nAA/4/index/
total 2609304
-rw-rw-r-- 1       2932 Jan 21 17:35 _1l.fdm
-rw-rw-r-- 1 1381174331 Jan 21 17:36 _1l.fdt
-rw-rw-r-- 1     213326 Jan 21 17:35 _1l.fdx
-rw-rw-r-- 1       1150 Jan 21 17:35 _1l.fnm
-rw-rw-r-- 1  346893501 Jan 21 17:35 _1l.kdd
-rw-rw-r-- 1    1562614 Jan 21 17:35 _1l.kdi
-rw-rw-r-- 1        421 Jan 21 17:35 _1l.kdm
-rw-rw-r-- 1   36299221 Jan 21 17:35 _1l.nvd
-rw-rw-r-- 1        103 Jan 21 17:35 _1l.nvm
-rw-rw-r-- 1        594 Jan 21 17:35 _1l.si
-rw-rw-r-- 1  168908141 Jan 21 17:35 _1l_Lucene90_0.doc
-rw-rw-r-- 1  456358016 Jan 21 17:35 _1l_Lucene90_0.dvd
-rw-rw-r-- 1       1025 Jan 21 17:35 _1l_Lucene90_0.dvm
-rw-rw-r-- 1   44441641 Jan 21 17:35 _1l_Lucene90_0.pos
-rw-rw-r-- 1  220157033 Jan 21 17:35 _1l_Lucene90_0.tim
-rw-rw-r-- 1   15863496 Jan 21 17:35 _1l_Lucene90_0.tip
-rw-rw-r-- 1        464 Jan 21 17:35 _1l_Lucene90_0.tmd
-rw-rw-r-- 1        375 Jan 21 17:36 segments_j
-rw-rw-r-- 1          0 Jan 21 17:35 write.lock

@IanHoang
Copy link
Collaborator Author

Not seeing .cfs files in these shards but based on what's shown above:

  • Segment Info is 594 bytes for all shards
  • Segments File is 375 bytes for all shards
  • Field Data is ~1.3 GB for all shards

@msfroh Based on the files above for each shard, are there any that stand out to you?
Reference: https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html#file-names

@msfroh
Copy link

msfroh commented Jan 24, 2024

For numeric range queries (which tend to dominate http-logs), the .kd? files tend to be most relevant, since they have the KD trees for the numeric points.

In this particular case, all the shards are merged to a single segment. In general, we would expect two indices constructed with the same data and merged to a single segment across all shards to behave similarly. (There may be slight variations based on the document ordering within shards, but I wouldn't expect more than low single digit percent change.)

How does it compare to another run with different performance characteristics? If we have two runs, each with the same data, and each merged to a single segment (per shard) and performance is wildly different, then we'll need to figure out an explanation for that.

@IanHoang
Copy link
Collaborator Author

IanHoang commented Jan 30, 2024

Rerun of same configuration

### Shard 1
$ ls -gGh data/nodes/0/indices/do471FrKR5K91l9H3jNS6w/0/index
total 2.5G
-rw-rw-r-- 1 2.9K Jan 30 17:34 _1o.fdm
-rw-rw-r-- 1 1.3G Jan 30 17:31 _1o.fdt
-rw-rw-r-- 1 210K Jan 30 17:30 _1o.fdx
-rw-rw-r-- 1 1.2K Jan 30 17:33 _1o.fnm
-rw-rw-r-- 1 334M Jan 30 17:33 _1o.kdd
-rw-rw-r-- 1 1.5M Jan 30 17:34 _1o.kdi
-rw-rw-r-- 1  421 Jan 30 17:30 _1o.kdm
-rw-rw-r-- 1  35M Jan 30 17:31 _1o.nvd
-rw-rw-r-- 1  103 Jan 30 17:32 _1o.nvm
-rw-rw-r-- 1  594 Jan 30 17:30 _1o.si
-rw-rw-r-- 1 162M Jan 30 17:33 _1o_Lucene90_0.doc
-rw-rw-r-- 1 436M Jan 30 17:33 _1o_Lucene90_0.dvd
-rw-rw-r-- 1 1.1K Jan 30 17:32 _1o_Lucene90_0.dvm
-rw-rw-r-- 1  43M Jan 30 17:34 _1o_Lucene90_0.pos
-rw-rw-r-- 1 211M Jan 30 17:33 _1o_Lucene90_0.tim
-rw-rw-r-- 1  16M Jan 30 17:32 _1o_Lucene90_0.tip
-rw-rw-r-- 1  464 Jan 30 17:30 _1o_Lucene90_0.tmd
-rw-rw-r-- 1  375 Jan 30 17:34 segments_k
-rw-rw-r-- 1    0 Jan 30 17:30 write.lock
### Shard 2
$ ls -gGh data/nodes/0/indices/do471FrKR5K91l9H3jNS6w/1/index
total 2.5G
-rw-rw-r-- 1 2.9K Jan 30 17:33 _1n.fdm
-rw-rw-r-- 1 1.3G Jan 30 17:34 _1n.fdt
-rw-rw-r-- 1 208K Jan 30 17:34 _1n.fdx
-rw-rw-r-- 1 1.2K Jan 30 17:30 _1n.fnm
-rw-rw-r-- 1 331M Jan 30 17:32 _1n.kdd
-rw-rw-r-- 1 1.5M Jan 30 17:33 _1n.kdi
-rw-rw-r-- 1  421 Jan 30 17:33 _1n.kdm
-rw-rw-r-- 1  35M Jan 30 17:34 _1n.nvd
-rw-rw-r-- 1  103 Jan 30 17:30 _1n.nvm
-rw-rw-r-- 1  594 Jan 30 17:30 _1n.si
-rw-rw-r-- 1 162M Jan 30 17:32 _1n_Lucene90_0.doc
-rw-rw-r-- 1 436M Jan 30 17:32 _1n_Lucene90_0.dvd
-rw-rw-r-- 1 1.1K Jan 30 17:30 _1n_Lucene90_0.dvm
-rw-rw-r-- 1  43M Jan 30 17:30 _1n_Lucene90_0.pos
-rw-rw-r-- 1 210M Jan 30 17:33 _1n_Lucene90_0.tim
-rw-rw-r-- 1  16M Jan 30 17:30 _1n_Lucene90_0.tip
-rw-rw-r-- 1  461 Jan 30 17:33 _1n_Lucene90_0.tmd
-rw-rw-r-- 1  375 Jan 30 17:34 segments_j
-rw-rw-r-- 1    0 Jan 30 17:30 write.lock
### Shard 3
$ ls -gGh data/nodes/0/indices/do471FrKR5K91l9H3jNS6w/2/index
total 2.5G
-rw-rw-r-- 1 2.9K Jan 30 17:34 _1k.fdm
-rw-rw-r-- 1 1.3G Jan 30 17:31 _1k.fdt
-rw-rw-r-- 1 209K Jan 30 17:30 _1k.fdx
-rw-rw-r-- 1 1.2K Jan 30 17:33 _1k.fnm
-rw-rw-r-- 1 334M Jan 30 17:33 _1k.kdd
-rw-rw-r-- 1 1.5M Jan 30 17:34 _1k.kdi
-rw-rw-r-- 1  421 Jan 30 17:33 _1k.kdm
-rw-rw-r-- 1  35M Jan 30 17:30 _1k.nvd
-rw-rw-r-- 1  103 Jan 30 17:33 _1k.nvm
-rw-rw-r-- 1  594 Jan 30 17:32 _1k.si
-rw-rw-r-- 1 162M Jan 30 17:34 _1k_Lucene90_0.doc
-rw-rw-r-- 1 436M Jan 30 17:32 _1k_Lucene90_0.dvd
-rw-rw-r-- 1 1.1K Jan 30 17:33 _1k_Lucene90_0.dvm
-rw-rw-r-- 1  43M Jan 30 17:33 _1k_Lucene90_0.pos
-rw-rw-r-- 1 211M Jan 30 17:34 _1k_Lucene90_0.tim
-rw-rw-r-- 1  16M Jan 30 17:33 _1k_Lucene90_0.tip
-rw-rw-r-- 1  470 Jan 30 17:31 _1k_Lucene90_0.tmd
-rw-rw-r-- 1  375 Jan 30 17:34 segments_j
-rw-rw-r-- 1    0 Jan 30 17:30 write.lock
### Shard 4
ls -gGh data/nodes/0/indices/do471FrKR5K91l9H3jNS6w/3/index
total 2.5G
-rw-rw-r-- 1 2.9K Jan 30 17:34 _1o.fdm
-rw-rw-r-- 1 1.3G Jan 30 17:33 _1o.fdt
-rw-rw-r-- 1 207K Jan 30 17:33 _1o.fdx
-rw-rw-r-- 1 1.2K Jan 30 17:33 _1o.fnm
-rw-rw-r-- 1 332M Jan 30 17:34 _1o.kdd
-rw-rw-r-- 1 1.5M Jan 30 17:35 _1o.kdi
-rw-rw-r-- 1  421 Jan 30 17:31 _1o.kdm
-rw-rw-r-- 1  35M Jan 30 17:33 _1o.nvd
-rw-rw-r-- 1  103 Jan 30 17:33 _1o.nvm
-rw-rw-r-- 1  594 Jan 30 17:32 _1o.si
-rw-rw-r-- 1 161M Jan 30 17:34 _1o_Lucene90_0.doc
-rw-rw-r-- 1 436M Jan 30 17:35 _1o_Lucene90_0.dvd
-rw-rw-r-- 1 1.1K Jan 30 17:33 _1o_Lucene90_0.dvm
-rw-rw-r-- 1  43M Jan 30 17:34 _1o_Lucene90_0.pos
-rw-rw-r-- 1 210M Jan 30 17:34 _1o_Lucene90_0.tim
-rw-rw-r-- 1  16M Jan 30 17:33 _1o_Lucene90_0.tip
-rw-rw-r-- 1  480 Jan 30 17:31 _1o_Lucene90_0.tmd
-rw-rw-r-- 1  375 Jan 30 17:35 segments_k
-rw-rw-r-- 1    0 Jan 30 17:30 write.lock
### Shard 5
$ ls -gGh data/nodes/0/indices/do471FrKR5K91l9H3jNS6w/4/index
total 2.5G
-rw-rw-r-- 1 2.9K Jan 30 17:35 _1l.fdm
-rw-rw-r-- 1 1.3G Jan 30 17:36 _1l.fdt
-rw-rw-r-- 1 209K Jan 30 17:35 _1l.fdx
-rw-rw-r-- 1 1.2K Jan 30 17:35 _1l.fnm
-rw-rw-r-- 1 331M Jan 30 17:35 _1l.kdd
-rw-rw-r-- 1 1.5M Jan 30 17:35 _1l.kdi
-rw-rw-r-- 1  421 Jan 30 17:35 _1l.kdm
-rw-rw-r-- 1  35M Jan 30 17:35 _1l.nvd
-rw-rw-r-- 1  103 Jan 30 17:35 _1l.nvm
-rw-rw-r-- 1  594 Jan 30 17:35 _1l.si
-rw-rw-r-- 1 162M Jan 30 17:35 _1l_Lucene90_0.doc
-rw-rw-r-- 1 436M Jan 30 17:35 _1l_Lucene90_0.dvd
-rw-rw-r-- 1 1.1K Jan 30 17:35 _1l_Lucene90_0.dvm
-rw-rw-r-- 1  43M Jan 30 17:34 _1l_Lucene90_0.pos
-rw-rw-r-- 1 210M Jan 30 17:35 _1l_Lucene90_0.tim
-rw-rw-r-- 1  16M Jan 30 17:35 _1l_Lucene90_0.tip
-rw-rw-r-- 1  464 Jan 30 17:35 _1l_Lucene90_0.tmd
-rw-rw-r-- 1  375 Jan 30 17:36 segments_j
-rw-rw-r-- 1    0 Jan 30 17:34 write.lock

@IanHoang
Copy link
Collaborator Author

IanHoang commented Feb 6, 2024

Segment Tuning Experiment 1

  • Workload: Http logs with index logs-241998 only
  • Corpus: 23GB
  • Number of shards: 1
  • Max Segment Size: 5GB (default)
  • Floor Segment Size: 2000MB (default)
  • Merge Policy: Tiered
  • Force Merge (n=1): True
  • Prediction: Expect little variance, but worst performance

Results

Segments from Disk

[ec2-user@ip-10-0-3-181 opensearch]$ ls -gGh data/nodes/0/indices/jykcFJs7R-Wc2G-EOG4jRA/0/index
total 15G
-rw-rw-r-- 1  14K Feb  5 17:01 _12n.fdm
-rw-rw-r-- 1 7.0G Feb  5 17:01 _12n.fdt
-rw-rw-r-- 1 1.3M Feb  5 17:01 _12n.fdx
-rw-rw-r-- 1 1.2K Feb  5 17:10 _12n.fnm
-rw-rw-r-- 1 2.8G Feb  5 17:10 _12n.kdd
-rw-rw-r-- 1 7.2M Feb  5 17:10 _12n.kdi
-rw-rw-r-- 1  421 Feb  5 17:10 _12n.kdm
-rw-rw-r-- 1 174M Feb  5 17:02 _12n.nvd
-rw-rw-r-- 1  103 Feb  5 17:02 _12n.nvm
-rw-rw-r-- 1  611 Feb  5 17:10 _12n.si
-rw-rw-r-- 1 802M Feb  5 17:06 _12n_Lucene90_0.doc
-rw-rw-r-- 1 2.3G Feb  5 17:08 _12n_Lucene90_0.dvd
-rw-rw-r-- 1 1.1K Feb  5 17:08 _12n_Lucene90_0.dvm
-rw-rw-r-- 1 211M Feb  5 17:06 _12n_Lucene90_0.pos
-rw-rw-r-- 1 1.1G Feb  5 17:06 _12n_Lucene90_0.tim
-rw-rw-r-- 1  48M Feb  5 17:06 _12n_Lucene90_0.tip
-rw-rw-r-- 1  445 Feb  5 17:06 _12n_Lucene90_0.tmd
-rw-rw-r-- 1  381 Feb  5 17:10 segments_24
-rw-rw-r-- 1    0 Feb  5 16:39 write.lock

Results

Tasks Latency
image

@IanHoang
Copy link
Collaborator Author

IanHoang commented Feb 8, 2024

Number of segments in cluster after experiment 1:

$ curl "<cluster endpoint>/_cat/segments?v"
index       shard prirep ip         segment generation docs.count docs.deleted   size size.memory committed searchable version compound
logs-241998 0     p      10.0.3.181 _12n          1391  181463624            0 14.2gb           0 true      true       9.7.0   false

After running experiment 1, I ran a modified version of VisualizePointTree.java from Lucene-University repository. Got the following values.

$ ./gradlew run -PclassToExecute=example.points.VisualizePointTree

> Task :run
Tree for segment 0
0 [897249601000,897854400000] - 181463624
Finished printing segments

Plugging these two timestamps into an epoch-converter, we get the following dates:

$ ./gradlew run -PclassToExecute=example.points.VisualizePointTree

> Task :run
Tree for segment 0
0 [June 7, 1998 8:00:01PM GMT, June 14, 1998 8:00:00PM GMT] - 181463624
Finished printing segments

@IanHoang
Copy link
Collaborator Author

IanHoang commented Feb 9, 2024

Segment Tuning Experiment 2

  • Workload: Http logs with index logs-241998 only
  • Corpus: 23GB
  • Number of shards: 1
  • Max Segment Size: 5GB (default)
  • Floor Segment Size: 2000MB (default)
  • Merge Policy: Tiered
  • Force Merge (n=1): False
  • Prediction: Expect lots of variance, as segment sizes can range

Results

Task Latency
image

Number of Segments: 29
Segment Size Distribution: Varies, some segments have much more than others

Segments from _cat API

hoangia@3c22fbd0d988 snapshots-segment-comparison % curl "<endpoint>/_cat/segments?v"
index       shard prirep ip         segment generation docs.count docs.deleted    size size.memory committed searchable version compound
logs-241998 0     p      10.0.3.181 _jc            696   64825354            0     5gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _p7            907    1077351            0  80.2mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _w0           1152   65326014            0     5gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _wa           1162     679592            0  51.4mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _x5           1193    1336099            0  99.5mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _xr           1215     726632            0  55.1mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _zf           1275   25316877            0   1.9gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _zn           1283     573718            0    44mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _zv           1291    2461803            0 190.3mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _10b          1307    3247844            0 251.3mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _10i          1314    1079016            0  79.7mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _10s          1324    1956684            0 150.4mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _115          1337     125087            0     9mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _11b          1343    3372310            0 255.5mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _11c          1344      10668            0 990.5kb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _11p          1357    2072444            0   160mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _11s          1360      13265            0   1.1mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _11t          1361       3879            0   393kb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _125          1373     135405            0   9.5mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _127          1375    2183413            0   166mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _12a          1378     135906            0   9.7mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _12b          1379     716208            0    54mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _12c          1380     120100            0   8.3mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _12d          1381      55878            0   3.9mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _12e          1382     141101            0   9.5mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _12f          1383      83712            0   5.8mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _12g          1384      50171            0   3.4mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _12h          1385      39976            0   2.7mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _12i          1386    3597117            0 273.5mb           0 true      true       9.7.0   true

Segments from Disk

[ec2-user@ip-10-0-3-181 index]$ ls -lrth
total 15G
-rw-rw-r-- 1 ec2-user ec2-user     0 Feb  9 15:57 write.lock
-rw-rw-r-- 1 ec2-user ec2-user  433K Feb  9 16:07 _jc.fdx
-rw-rw-r-- 1 ec2-user ec2-user  2.6G Feb  9 16:07 _jc.fdt
-rw-rw-r-- 1 ec2-user ec2-user  5.1K Feb  9 16:07 _jc.fdm
-rw-rw-r-- 1 ec2-user ec2-user   103 Feb  9 16:07 _jc.nvm
-rw-rw-r-- 1 ec2-user ec2-user   62M Feb  9 16:07 _jc.nvd
-rw-rw-r-- 1 ec2-user ec2-user   81M Feb  9 16:10 _p7.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:10 _p7.cfe
-rw-rw-r-- 1 ec2-user ec2-user   380 Feb  9 16:10 _p7.si
-rw-rw-r-- 1 ec2-user ec2-user   482 Feb  9 16:10 _jc_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user   18M Feb  9 16:10 _jc_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  382M Feb  9 16:10 _jc_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   76M Feb  9 16:10 _jc_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user  288M Feb  9 16:10 _jc_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user  1.1K Feb  9 16:11 _jc_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  807M Feb  9 16:11 _jc_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user   568 Feb  9 16:12 _jc.si
-rw-rw-r-- 1 ec2-user ec2-user   421 Feb  9 16:12 _jc.kdm
-rw-rw-r-- 1 ec2-user ec2-user  2.7M Feb  9 16:12 _jc.kdi
-rw-rw-r-- 1 ec2-user ec2-user 1011M Feb  9 16:12 _jc.kdd
-rw-rw-r-- 1 ec2-user ec2-user  1.2K Feb  9 16:12 _jc.fnm
-rw-rw-r-- 1 ec2-user ec2-user   52M Feb  9 16:13 _wa.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:13 _wa.cfe
-rw-rw-r-- 1 ec2-user ec2-user   380 Feb  9 16:13 _wa.si
-rw-rw-r-- 1 ec2-user ec2-user  451K Feb  9 16:14 _w0.fdx
-rw-rw-r-- 1 ec2-user ec2-user  2.6G Feb  9 16:14 _w0.fdt
-rw-rw-r-- 1 ec2-user ec2-user  5.1K Feb  9 16:14 _w0.fdm
-rw-rw-r-- 1 ec2-user ec2-user   103 Feb  9 16:14 _w0.nvm
-rw-rw-r-- 1 ec2-user ec2-user   63M Feb  9 16:14 _w0.nvd
-rw-rw-r-- 1 ec2-user ec2-user  100M Feb  9 16:14 _x5.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:14 _x5.cfe
-rw-rw-r-- 1 ec2-user ec2-user   380 Feb  9 16:14 _x5.si
-rw-rw-r-- 1 ec2-user ec2-user   56M Feb  9 16:14 _xr.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:14 _xr.cfe
-rw-rw-r-- 1 ec2-user ec2-user   380 Feb  9 16:14 _xr.si
-rw-rw-r-- 1 ec2-user ec2-user   45M Feb  9 16:15 _zn.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:15 _zn.cfe
-rw-rw-r-- 1 ec2-user ec2-user   380 Feb  9 16:15 _zn.si
-rw-rw-r-- 1 ec2-user ec2-user  168K Feb  9 16:16 _zf.fdx
-rw-rw-r-- 1 ec2-user ec2-user  996M Feb  9 16:16 _zf.fdt
-rw-rw-r-- 1 ec2-user ec2-user  2.1K Feb  9 16:16 _zf.fdm
-rw-rw-r-- 1 ec2-user ec2-user   103 Feb  9 16:16 _zf.nvm
-rw-rw-r-- 1 ec2-user ec2-user   25M Feb  9 16:16 _zf.nvd
-rw-rw-r-- 1 ec2-user ec2-user  191M Feb  9 16:16 _zv.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:16 _zv.cfe
-rw-rw-r-- 1 ec2-user ec2-user   380 Feb  9 16:16 _zv.si
-rw-rw-r-- 1 ec2-user ec2-user   80M Feb  9 16:16 _10i.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:16 _10i.cfe
-rw-rw-r-- 1 ec2-user ec2-user   383 Feb  9 16:16 _10i.si
-rw-rw-r-- 1 ec2-user ec2-user   460 Feb  9 16:16 _w0_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user   19M Feb  9 16:16 _w0_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  387M Feb  9 16:16 _w0_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   76M Feb  9 16:16 _w0_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user  289M Feb  9 16:16 _w0_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user  252M Feb  9 16:16 _10b.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:16 _10b.cfe
-rw-rw-r-- 1 ec2-user ec2-user   383 Feb  9 16:16 _10b.si
-rw-rw-r-- 1 ec2-user ec2-user  151M Feb  9 16:17 _10s.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:17 _10s.cfe
-rw-rw-r-- 1 ec2-user ec2-user   383 Feb  9 16:17 _10s.si
-rw-rw-r-- 1 ec2-user ec2-user   489 Feb  9 16:17 _zf_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user  8.5M Feb  9 16:17 _zf_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  151M Feb  9 16:17 _zf_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   30M Feb  9 16:17 _zf_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user  112M Feb  9 16:17 _zf_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user   345 Feb  9 16:17 _11c.si
-rw-rw-r-- 1 ec2-user ec2-user  990K Feb  9 16:17 _11c.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:17 _11c.cfe
-rw-rw-r-- 1 ec2-user ec2-user   345 Feb  9 16:17 _115.si
-rw-rw-r-- 1 ec2-user ec2-user  9.1M Feb  9 16:17 _115.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:17 _115.cfe
-rw-rw-r-- 1 ec2-user ec2-user  256M Feb  9 16:17 _11b.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:17 _11b.cfe
-rw-rw-r-- 1 ec2-user ec2-user   383 Feb  9 16:17 _11b.si
-rw-rw-r-- 1 ec2-user ec2-user  1.1K Feb  9 16:17 _zf_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  316M Feb  9 16:17 _zf_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user   345 Feb  9 16:17 _11t.si
-rw-rw-r-- 1 ec2-user ec2-user  393K Feb  9 16:17 _11t.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:17 _11t.cfe
-rw-rw-r-- 1 ec2-user ec2-user   345 Feb  9 16:17 _11s.si
-rw-rw-r-- 1 ec2-user ec2-user  1.2M Feb  9 16:17 _11s.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:17 _11s.cfe
-rw-rw-r-- 1 ec2-user ec2-user  161M Feb  9 16:17 _11p.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:17 _11p.cfe
-rw-rw-r-- 1 ec2-user ec2-user   383 Feb  9 16:17 _11p.si
-rw-rw-r-- 1 ec2-user ec2-user   345 Feb  9 16:17 _125.si
-rw-rw-r-- 1 ec2-user ec2-user  9.6M Feb  9 16:17 _125.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:17 _125.cfe
-rw-rw-r-- 1 ec2-user ec2-user  812M Feb  9 16:17 _w0_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user  1.1K Feb  9 16:17 _w0_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  167M Feb  9 16:17 _127.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:17 _127.cfe
-rw-rw-r-- 1 ec2-user ec2-user   383 Feb  9 16:17 _127.si
-rw-rw-r-- 1 ec2-user ec2-user   569 Feb  9 16:17 _zf.si
-rw-rw-r-- 1 ec2-user ec2-user   421 Feb  9 16:17 _zf.kdm
-rw-rw-r-- 1 ec2-user ec2-user  1.1M Feb  9 16:17 _zf.kdi
-rw-rw-r-- 1 ec2-user ec2-user  361M Feb  9 16:17 _zf.kdd
-rw-rw-r-- 1 ec2-user ec2-user  1.2K Feb  9 16:17 _zf.fnm
-rw-rw-r-- 1 ec2-user ec2-user   345 Feb  9 16:17 _12a.si
-rw-rw-r-- 1 ec2-user ec2-user  9.8M Feb  9 16:17 _12a.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:17 _12a.cfe
-rw-rw-r-- 1 ec2-user ec2-user   55M Feb  9 16:18 _12b.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:18 _12b.cfe
-rw-rw-r-- 1 ec2-user ec2-user   383 Feb  9 16:18 _12b.si
-rw-rw-r-- 1 ec2-user ec2-user  4.0M Feb  9 16:18 _12d.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:18 _12d.cfe
-rw-rw-r-- 1 ec2-user ec2-user   345 Feb  9 16:18 _12d.si
-rw-rw-r-- 1 ec2-user ec2-user   345 Feb  9 16:18 _12c.si
-rw-rw-r-- 1 ec2-user ec2-user  8.4M Feb  9 16:18 _12c.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:18 _12c.cfe
-rw-rw-r-- 1 ec2-user ec2-user   345 Feb  9 16:18 _12e.si
-rw-rw-r-- 1 ec2-user ec2-user  9.6M Feb  9 16:18 _12e.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:18 _12e.cfe
-rw-rw-r-- 1 ec2-user ec2-user   345 Feb  9 16:18 _12f.si
-rw-rw-r-- 1 ec2-user ec2-user  5.9M Feb  9 16:18 _12f.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:18 _12f.cfe
-rw-rw-r-- 1 ec2-user ec2-user   345 Feb  9 16:18 _12g.si
-rw-rw-r-- 1 ec2-user ec2-user  3.5M Feb  9 16:18 _12g.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:18 _12g.cfe
-rw-rw-r-- 1 ec2-user ec2-user   345 Feb  9 16:18 _12h.si
-rw-rw-r-- 1 ec2-user ec2-user  2.8M Feb  9 16:18 _12h.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:18 _12h.cfe
-rw-rw-r-- 1 ec2-user ec2-user  274M Feb  9 16:18 _12i.cfs
-rw-rw-r-- 1 ec2-user ec2-user   479 Feb  9 16:18 _12i.cfe
-rw-rw-r-- 1 ec2-user ec2-user   383 Feb  9 16:18 _12i.si
-rw-rw-r-- 1 ec2-user ec2-user   568 Feb  9 16:18 _w0.si
-rw-rw-r-- 1 ec2-user ec2-user   421 Feb  9 16:18 _w0.kdm
-rw-rw-r-- 1 ec2-user ec2-user  2.7M Feb  9 16:18 _w0.kdi
-rw-rw-r-- 1 ec2-user ec2-user 1007M Feb  9 16:18 _w0.kdd
-rw-rw-r-- 1 ec2-user ec2-user  1.2K Feb  9 16:18 _w0.fnm
-rw-rw-r-- 1 ec2-user ec2-user  2.7K Feb  9 16:23 segments_24

Segment Timestamps

[ec2-user@ip-10-0-3-181 lucene-university]$ ./gradlew run -PclassToExecute=example.points.VisualizePointTree

> Task :run
Tree for segment 0
0 [897249601000,897791329000] - 64825354
Tree for segment 1
0 [897304513000,897838891000] - 65326014
Tree for segment 2
0 [897386239000,897812390000] - 1077351
Tree for segment 3
0 [897396090000,897845203000] - 25316877
Tree for segment 4
0 [897400919000,897840661000] - 1336099
Tree for segment 5
0 [897402269000,897839027000] - 679592
Tree for segment 6
0 [897403322000,897850660000] - 3372310
Tree for segment 7
0 [897404107000,897842088000] - 726632
Tree for segment 8
0 [897405630000,897853888000] - 3597117
Tree for segment 9
0 [897408616000,897848866000] - 1079016
Tree for segment 10
0 [897408617000,897846556000] - 573718
Tree for segment 11
0 [897408994000,897849775000] - 1956684
Tree for segment 12
0 [897409711000,897851613000] - 2072444
Tree for segment 13
0 [897410861000,897848489000] - 3247844
Tree for segment 14
0 [897410861000,897847357000] - 2461803
Tree for segment 15
0 [897413722000,897853110000] - 716208
Tree for segment 16
0 [897414788000,897852859000] - 2183413
Tree for segment 17
0 [897415860000,897850797000] - 10668
Tree for segment 18
0 [897415775000,897850797000] - 125087
Tree for segment 19
0 [897416944000,897851841000] - 13265
Tree for segment 20
0 [897417000000,897851841000] - 3879
Tree for segment 21
0 [897418076000,897853124000] - 135405
Tree for segment 22
0 [897418235000,897853888000] - 135906
Tree for segment 23
0 [897419091000,897854289000] - 141101
Tree for segment 24
0 [897418966000,897854307000] - 120100
Tree for segment 25
0 [897419028000,897854288000] - 55878
Tree for segment 26
0 [897419425000,897854400000] - 83712
Tree for segment 27
0 [897419412000,897854400000] - 50171
Tree for segment 28
0 [897419445000,897854400000] - 39976
Finished printing segments

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.5/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 1s
2 actionable tasks: 2 executed

@IanHoang
Copy link
Collaborator Author

Converted times

[ec2-user@ip-10-0-3-181 lucene-university]$ ./gradlew run -PclassToExecute=example.points.VisualizePointTree

> Task :run
Tree for segment 0
0 [June 7, 1998 8:00:01PM GMT, June 14, 1998 2:28:49AM GMT] - 64825354
Tree for segment 1
0 [June 8, 1998 11:15:13AM GMT, June 14, 1998 3:41:31PM GMT] - 65326014
Tree for segment 2
0 [June 9, 1998 9:57:19AM GMT, June 14, 1998 8:19:50AM GMT] - 1077351
Tree for segment 3
0 [June 9, 1998 12:41:30PM GMT, June 14, 1998 5:26:43PM GMT] - 25316877
Tree for segment 4
0 [June 9, 1998 2:01:59PM GMT, June 14, 1998 4:11:01PM GMT] - 1336099
Tree for segment 5
0 [June 9, 1998 2:24:29PM GMT, June 14, 1998 3:43:47PM GMT] - 679592
Tree for segment 6
0 [June 9, 1998 2:42:02PM GMT, June 14, 1998 6:57:40PM GMT] - 3372310
Tree for segment 7
0 [June 9, 1998 2:55:07PM GMT, June 14, 1998 4:34:48PM GMT] - 726632
Tree for segment 8
0 [June 9, 1998 3:20:30PM GMT,June 14, 1998 7:51:28PM GMT] - 3597117
Tree for segment 9
0 [June 9, 1998 4:10:16 PM, June 14, 1998 6:27:46PM GMT] - 1079016
Tree for segment 10
0 [June 9, 1998 4:10:17PM GMT, June 14, 1998 5:49:16PM GMT] - 573718
Tree for segment 11
0 [June 9, 1998 4:16:34PM GMT, June 14, 1998 6:42:55PM GMT] - 1956684
Tree for segment 12
0 [June 9, 1998 4:28:31PM GMT, June 14, 1998 7:13:33PM GMT] - 2072444
Tree for segment 13
0 [June 9, 1998 4:47:41PM GMT, June 14, 1998 6:21:29PM GMT] - 3247844
Tree for segment 14
0 [June 9, 1998 4:47:41PM GMT,  June 14, 1998 6:02:37PM GMT] - 2461803
Tree for segment 15
0 [June 9, 1998 5:35:22PM GMT, June 14, 1998 7:38:30PM GMT] - 716208
Tree for segment 16
0 [June 9, 1998 5:53:08PM GMT, June 14, 1998 7:34:19PM GMT] - 2183413
Tree for segment 17
0 [ June 9, 1998 6:11:00PM GMT, June 14, 1998 6:59:57PM GMT] - 10668
Tree for segment 18
0 [June 9, 1998 6:09:35PM GMT, June 14, 1998 6:59:57PM GMT] - 125087
Tree for segment 19
0 [June 9, 1998 6:29:04PM GMT, June 14, 1998 7:17:21PM GMT] - 13265
Tree for segment 20
0 [June 9, 1998 6:30:00PM GMT, June 14, 1998 7:17:21PM GMT] - 3879
Tree for segment 21
0 [ June 9, 1998 6:47:56PM GMT, June 14, 1998 7:38:44PM GMT] - 135405
Tree for segment 22
0 [June 9, 1998 6:50:35PM GMT, June 14, 1998 7:51:28PM GMT] - 135906
Tree for segment 23
0 [June 9, 1998 7:04:51PM GMT, June 14, 1998 7:58:09PM GMT] - 141101
Tree for segment 24
0 [June 9, 1998 7:02:46PM GMT, June 14, 1998 7:58:27PM GMT] - 120100
Tree for segment 25
0 [June 9, 1998 7:03:48PM GMT, June 14, 1998 7:58:08PM GMT] - 55878
Tree for segment 26
0 [June 9, 1998 7:10:25PM GMT, June 14, 1998 8:00:00PM GMT] - 83712
Tree for segment 27
0 [June 9, 1998 7:10:12PM GMT, June 14, 1998 8:00:00PM GMT] - 50171
Tree for segment 28
0 [June 9, 1998 7:10:45PM GMT, June 14, 1998 8:00:00PM GMT] - 39976
Finished printing segments

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.5/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 1s
2 actionable tasks: 2 executed

@IanHoang
Copy link
Collaborator Author

IanHoang commented Feb 13, 2024

Segment Tuning Experiment 3

  • Workload: Http logs with index logs-241998 only
  • Corpus: 23GB
  • Number of shards: 1
  • Max Segment Size: 2GB
  • Floor Segment Size: 2000MB (default)
  • Merge Policy: Tiered
  • Force Merge (n=1): False
  • Prediction: Expect lots of variance, as segment sizes can range

Created the index first and updated the index.merge.policy.max_merged_segment field to 2GB.

Results

Task Latency
image

Number of Segments: 22
Segment Size Distribution: Similar to experiment 2, varies and some segments are bigger than others.

Segments from _cat API

hoangia@3c22fbd0d988 snapshots-segment-comparison % curl "<endpoint>/_cat/segments?v"
index       shard prirep ip         segment generation docs.count docs.deleted    size size.memory committed searchable version compound
index       shard prirep ip         segment generation docs.count docs.deleted    size size.memory committed searchable version compound
logs-241998 0     p      10.0.3.181 _5y            214   25846743            0     2gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _bk            416   24585820            0   1.9gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _fw            572   26405257            0     2gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _lv            787   26582912            0     2gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _ph            917   20804196            0   1.5gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _rs           1000    1030320            0  78.2mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _ua           1090   23946580            0   1.8gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _vc           1128     389032            0  28.9mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _we           1166     508292            0  39.3mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _ws           1180    1141883            0  86.3mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _xi           1206     534429            0  41.1mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _y4           1228      34579            0   2.6mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _yk           1244      83707            0   6.3mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _yl           1245     929574            0  69.5mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _ym           1246      66454            0   5.1mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _yn           1247      50497            0     4mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _yo           1248      34402            0   2.8mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _yp           1249      13423            0   1.2mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _yq           1250      12690            0   1.1mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _yz           1259      80723            0   6.7mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _z0           1260   26447317            0     2gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _z1           1261    1934794            0 147.1mb           0 true      true       9.7.0   true

Segments from Disk

[ec2-user@ip-10-0-3-181 index]$ ls -lrt
total 14693528
-rw-rw-r-- 1 ec2-user ec2-user          0 Feb 12 19:57 write.lock
-rw-rw-r-- 1 ec2-user ec2-user     172177 Feb 12 20:02 _5y.fdx
-rw-rw-r-- 1 ec2-user ec2-user 1078471799 Feb 12 20:02 _5y.fdt
-rw-rw-r-- 1 ec2-user ec2-user       2135 Feb 12 20:02 _5y.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 12 20:02 _5y.nvm
-rw-rw-r-- 1 ec2-user ec2-user   25846802 Feb 12 20:02 _5y.nvd
-rw-rw-r-- 1 ec2-user ec2-user        480 Feb 12 20:03 _5y_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    7461386 Feb 12 20:03 _5y_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  158780111 Feb 12 20:03 _5y_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   31812141 Feb 12 20:03 _5y_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user  120152564 Feb 12 20:03 _5y_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1017 Feb 12 20:04 _5y_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  337331388 Feb 12 20:04 _5y_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        569 Feb 12 20:04 _5y.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 12 20:04 _5y.kdm
-rw-rw-r-- 1 ec2-user ec2-user    1107002 Feb 12 20:04 _5y.kdi
-rw-rw-r-- 1 ec2-user ec2-user  391268015 Feb 12 20:04 _5y.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 12 20:04 _5y.fnm
-rw-rw-r-- 1 ec2-user ec2-user     169915 Feb 12 20:05 _bk.fdx
-rw-rw-r-- 1 ec2-user ec2-user 1022336792 Feb 12 20:05 _bk.fdt
-rw-rw-r-- 1 ec2-user ec2-user       2051 Feb 12 20:05 _bk.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 12 20:05 _bk.nvm
-rw-rw-r-- 1 ec2-user ec2-user   24585879 Feb 12 20:05 _bk.nvd
-rw-rw-r-- 1 ec2-user ec2-user        470 Feb 12 20:06 _bk_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    7365840 Feb 12 20:06 _bk_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  153940524 Feb 12 20:06 _bk_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   30276150 Feb 12 20:06 _bk_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user  114622789 Feb 12 20:06 _bk_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1017 Feb 12 20:07 _bk_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  320919045 Feb 12 20:07 _bk_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        569 Feb 12 20:07 _bk.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 12 20:07 _bk.kdm
-rw-rw-r-- 1 ec2-user ec2-user    1060363 Feb 12 20:07 _bk.kdi
-rw-rw-r-- 1 ec2-user ec2-user  375625293 Feb 12 20:07 _bk.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 12 20:07 _bk.fnm
-rw-rw-r-- 1 ec2-user ec2-user     187213 Feb 12 20:08 _fw.fdx
-rw-rw-r-- 1 ec2-user ec2-user 1091039702 Feb 12 20:08 _fw.fdt
-rw-rw-r-- 1 ec2-user ec2-user       2178 Feb 12 20:08 _fw.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 12 20:08 _fw.nvm
-rw-rw-r-- 1 ec2-user ec2-user   26405316 Feb 12 20:08 _fw.nvd
-rw-rw-r-- 1 ec2-user ec2-user        464 Feb 12 20:09 _fw_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    7811694 Feb 12 20:09 _fw_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  166204929 Feb 12 20:09 _fw_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   32427231 Feb 12 20:09 _fw_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user  122719148 Feb 12 20:09 _fw_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1025 Feb 12 20:09 _fw_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  344482505 Feb 12 20:09 _fw_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        569 Feb 12 20:10 _fw.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 12 20:10 _fw.kdm
-rw-rw-r-- 1 ec2-user ec2-user    1135111 Feb 12 20:10 _fw.kdi
-rw-rw-r-- 1 ec2-user ec2-user  407755418 Feb 12 20:10 _fw.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 12 20:10 _fw.fnm
-rw-rw-r-- 1 ec2-user ec2-user     187959 Feb 12 20:12 _lv.fdx
-rw-rw-r-- 1 ec2-user ec2-user 1097965202 Feb 12 20:12 _lv.fdt
-rw-rw-r-- 1 ec2-user ec2-user       2178 Feb 12 20:12 _lv.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 12 20:12 _lv.nvm
-rw-rw-r-- 1 ec2-user ec2-user   26582971 Feb 12 20:12 _lv.nvd
-rw-rw-r-- 1 ec2-user ec2-user        484 Feb 12 20:13 _lv_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    8010942 Feb 12 20:13 _lv_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  165417458 Feb 12 20:13 _lv_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   32539596 Feb 12 20:13 _lv_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user  122832666 Feb 12 20:13 _lv_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1025 Feb 12 20:13 _lv_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  347117880 Feb 12 20:13 _lv_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        569 Feb 12 20:14 _lv.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 12 20:14 _lv.kdm
-rw-rw-r-- 1 ec2-user ec2-user    1135864 Feb 12 20:14 _lv.kdi
-rw-rw-r-- 1 ec2-user ec2-user  411336910 Feb 12 20:14 _lv.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 12 20:14 _lv.fnm
-rw-rw-r-- 1 ec2-user ec2-user     147523 Feb 12 20:14 _ph.fdx
-rw-rw-r-- 1 ec2-user ec2-user  857123929 Feb 12 20:14 _ph.fdt
-rw-rw-r-- 1 ec2-user ec2-user       1757 Feb 12 20:14 _ph.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 12 20:14 _ph.nvm
-rw-rw-r-- 1 ec2-user ec2-user   20804255 Feb 12 20:14 _ph.nvd
-rw-rw-r-- 1 ec2-user ec2-user        484 Feb 12 20:14 _ph_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    6937232 Feb 12 20:14 _ph_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  129522729 Feb 12 20:14 _ph_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   25502200 Feb 12 20:14 _ph_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user   96212813 Feb 12 20:14 _ph_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user   82010013 Feb 12 20:15 _rs.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 12 20:15 _rs.cfe
-rw-rw-r-- 1 ec2-user ec2-user        380 Feb 12 20:15 _rs.si
-rw-rw-r-- 1 ec2-user ec2-user       1025 Feb 12 20:15 _ph_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  271120299 Feb 12 20:15 _ph_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        569 Feb 12 20:15 _ph.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 12 20:15 _ph.kdm
-rw-rw-r-- 1 ec2-user ec2-user     895467 Feb 12 20:15 _ph.kdi
-rw-rw-r-- 1 ec2-user ec2-user  293662448 Feb 12 20:15 _ph.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 12 20:15 _ph.fnm
-rw-rw-r-- 1 ec2-user ec2-user        342 Feb 12 20:16 _vc.si
-rw-rw-r-- 1 ec2-user ec2-user   30389226 Feb 12 20:16 _vc.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 12 20:16 _vc.cfe
-rw-rw-r-- 1 ec2-user ec2-user     167496 Feb 12 20:16 _ua.fdx
-rw-rw-r-- 1 ec2-user ec2-user  988009835 Feb 12 20:16 _ua.fdt
-rw-rw-r-- 1 ec2-user ec2-user       1967 Feb 12 20:16 _ua.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 12 20:17 _ua.nvm
-rw-rw-r-- 1 ec2-user ec2-user   23946639 Feb 12 20:17 _ua.nvd
-rw-rw-r-- 1 ec2-user ec2-user   41280483 Feb 12 20:17 _we.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 12 20:17 _we.cfe
-rw-rw-r-- 1 ec2-user ec2-user        380 Feb 12 20:17 _we.si
-rw-rw-r-- 1 ec2-user ec2-user   90555265 Feb 12 20:17 _ws.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 12 20:17 _ws.cfe
-rw-rw-r-- 1 ec2-user ec2-user        380 Feb 12 20:17 _ws.si
-rw-rw-r-- 1 ec2-user ec2-user        493 Feb 12 20:17 _ua_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    7100986 Feb 12 20:17 _ua_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  153403778 Feb 12 20:17 _ua_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   29269373 Feb 12 20:17 _ua_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user  110247684 Feb 12 20:17 _ua_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user   43175213 Feb 12 20:18 _xi.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 12 20:18 _xi.cfe
-rw-rw-r-- 1 ec2-user ec2-user        380 Feb 12 20:18 _xi.si
-rw-rw-r-- 1 ec2-user ec2-user       1025 Feb 12 20:18 _ua_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  312704535 Feb 12 20:18 _ua_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        342 Feb 12 20:18 _y4.si
-rw-rw-r-- 1 ec2-user ec2-user    2757379 Feb 12 20:18 _y4.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 12 20:18 _y4.cfe
-rw-rw-r-- 1 ec2-user ec2-user        569 Feb 12 20:18 _ua.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 12 20:18 _ua.kdm
-rw-rw-r-- 1 ec2-user ec2-user    1026264 Feb 12 20:18 _ua.kdi
-rw-rw-r-- 1 ec2-user ec2-user  354817753 Feb 12 20:18 _ua.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 12 20:18 _ua.fnm
-rw-rw-r-- 1 ec2-user ec2-user   72896754 Feb 12 20:18 _yl.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 12 20:18 _yl.cfe
-rw-rw-r-- 1 ec2-user ec2-user        380 Feb 12 20:18 _yl.si
-rw-rw-r-- 1 ec2-user ec2-user        342 Feb 12 20:18 _yp.si
-rw-rw-r-- 1 ec2-user ec2-user    1281702 Feb 12 20:18 _yp.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 12 20:18 _yp.cfe
-rw-rw-r-- 1 ec2-user ec2-user        342 Feb 12 20:18 _yq.si
-rw-rw-r-- 1 ec2-user ec2-user    1216577 Feb 12 20:18 _yq.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 12 20:18 _yq.cfe
-rw-rw-r-- 1 ec2-user ec2-user        342 Feb 12 20:18 _yn.si
-rw-rw-r-- 1 ec2-user ec2-user    4265801 Feb 12 20:18 _yn.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 12 20:18 _yn.cfe
-rw-rw-r-- 1 ec2-user ec2-user        342 Feb 12 20:18 _yo.si
-rw-rw-r-- 1 ec2-user ec2-user    3039842 Feb 12 20:18 _yo.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 12 20:18 _yo.cfe
-rw-rw-r-- 1 ec2-user ec2-user        342 Feb 12 20:18 _ym.si
-rw-rw-r-- 1 ec2-user ec2-user    5435074 Feb 12 20:18 _ym.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 12 20:18 _ym.cfe
-rw-rw-r-- 1 ec2-user ec2-user        342 Feb 12 20:18 _yk.si
-rw-rw-r-- 1 ec2-user ec2-user    6672823 Feb 12 20:18 _yk.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 12 20:18 _yk.cfe
-rw-rw-r-- 1 ec2-user ec2-user        342 Feb 12 20:19 _yz.si
-rw-rw-r-- 1 ec2-user ec2-user    7062511 Feb 12 20:19 _yz.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 12 20:19 _yz.cfe
-rw-rw-r-- 1 ec2-user ec2-user     182563 Feb 12 20:19 _z0.fdx
-rw-rw-r-- 1 ec2-user ec2-user 1092224672 Feb 12 20:19 _z0.fdt
-rw-rw-r-- 1 ec2-user ec2-user       2178 Feb 12 20:19 _z0.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 12 20:19 _z0.nvm
-rw-rw-r-- 1 ec2-user ec2-user   26447376 Feb 12 20:19 _z0.nvd
-rw-rw-r-- 1 ec2-user ec2-user        467 Feb 12 20:20 _z0_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    7984575 Feb 12 20:20 _z0_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  166177327 Feb 12 20:20 _z0_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   32390363 Feb 12 20:20 _z0_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user  122295844 Feb 12 20:20 _z0_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1025 Feb 12 20:20 _z0_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  345442273 Feb 12 20:20 _z0_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        569 Feb 12 20:20 _z0.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 12 20:20 _z0.kdm
-rw-rw-r-- 1 ec2-user ec2-user    1130757 Feb 12 20:20 _z0.kdi
-rw-rw-r-- 1 ec2-user ec2-user  405826350 Feb 12 20:20 _z0.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 12 20:20 _z0.fnm
-rw-rw-r-- 1 ec2-user ec2-user  154299206 Feb 12 20:21 _z1.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 12 20:21 _z1.cfe
-rw-rw-r-- 1 ec2-user ec2-user        380 Feb 12 20:21 _z1.si
-rw-rw-r-- 1 ec2-user ec2-user       2083 Feb 12 20:24 segments_22

Segment Timestamps

$ ./gradlew run -PclassToExecute=example.points.VisualizePointTree

> Task :run
Tree for segment 0
0 [897249601000,897762353000] - 25846743
Tree for segment 1
0 [897303205000,897774910000] - 24585820
Tree for segment 2
0 [897307396000,897788162000] - 26405257
Tree for segment 3
0 [897335839000,897811763000] - 26582912
Tree for segment 4
0 [897360682000,897834058000] - 20804196
Tree for segment 5
0 [897375045000,897842251000] - 23946580
Tree for segment 6
0 [897392146000,897853279000] - 26447317
Tree for segment 7
0 [897401490000,897838312000] - 1030320
Tree for segment 8
0 [897405138000,897848980000] - 1141883
Tree for segment 9
0 [897411418000,897848182000] - 508292
Tree for segment 10
0 [897411555000,897846802000] - 389032
Tree for segment 11
0 [897414584000,897852276000] - 929574
Tree for segment 12
0 [897414585000,897850374000] - 534429
Tree for segment 13
0 [897416800000,897854400000] - 1934794
Tree for segment 14
0 [897417675000,897852276000] - 34579
Tree for segment 15
0 [897418930000,897853579000] - 13423
Tree for segment 16
0 [897418783000,897853579000] - 83707
Tree for segment 17
0 [897418796000,897853579000] - 66454
Tree for segment 18
0 [897418801000,897853579000] - 50497
Tree for segment 19
0 [897418935000,897853579000] - 12690
Tree for segment 20
0 [897418824000,897853579000] - 34402
Tree for segment 21
0 [897419212000,897854388000] - 80723
Finished printing segments

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.5/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 935ms
2 actionable tasks: 1 executed, 1 up-to-date```

@IanHoang
Copy link
Collaborator Author

Converted time stamps for experiment 3

$ ./gradlew run -PclassToExecute=example.points.VisualizePointTree

> Task :run
Tree for segment 0
0 [July 7, 1998 20:00:01, July 13, 1998 18:25:53] - 25846743
Tree for segment 1
0 [July 8, 1998 10:53:25, July 13, 1998 21:55:10] - 24585820
Tree for segment 2
0 [July 8, 1998 12:03:16, July 14, 1998 01:36:02] - 26405257
Tree for segment 3
0 [July 8, 1998 19:57:19, July 14, 1998 08:09:23] - 26582912
Tree for segment 4
0 [July 9, 1998 02:51:22, July 14, 1998 14:20:58] - 20804196
Tree for segment 5
0 [July 9, 1998 06:50:45, July 14, 1998 16:37:31] - 23946580
Tree for segment 6
0 [July 9, 1998 11:35:46, July 14, 1998 19:41:19] - 26447317
Tree for segment 7
0 [July 9, 1998 14:11:30, July 14, 1998 15:31:52] - 1030320
Tree for segment 8
0 [July 9, 1998 15:12:18, July 14, 1998 18:29:40] - 1141883
Tree for segment 9
0 [July 9, 1998 16:56:58, July 14, 1998 18:16:22] - 508292
Tree for segment 10
0 [July 9, 1998 16:59:15, July 14, 1998 17:53:22] - 389032
Tree for segment 11
0 [July 9, 1998 17:49:44, July 14, 1998 19:24:36] - 929574
Tree for segment 12
0 [July 9, 1998 17:49:45, July 14, 1998 18:52:54] - 534429
Tree for segment 13
0 [July 9, 1998 18:26:40, July 14, 1998 20:00:00] - 1934794
Tree for segment 14
0 [July 9, 1998 18:41:15, July 14, 1998 19:24:36] - 34579
Tree for segment 15
0 [July 9, 1998 19:02:10, July 14, 1998 19:46:19] - 13423
Tree for segment 16
0 [July 9, 1998 18:59:43, July 14, 1998 19:46:19] - 83707
Tree for segment 17
0 [July 9, 1998 18:59:56, July 14, 1998 19:46:19] - 66454
Tree for segment 18
0 [July 9, 1998 19:00:01, July 14, 1998 19:46:19] - 50497
Tree for segment 19
0 [July 9, 1998 19:02:15, July 14, 1998 19:46:19] - 12690
Tree for segment 20
0 [July 9, 1998 19:00:24, July 14, 1998 19:46:19] - 34402
Tree for segment 21
0 [July 9, 1998 19:06:52, July 14, 1998 19:59:48] - 80723
Finished printing segments

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.5/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 935ms
2 actionable tasks: 1 executed, 1 up-to-date

@IanHoang
Copy link
Collaborator Author

IanHoang commented Feb 16, 2024

Segment Tuning Experiment 4

  • Workload: Http logs with index logs-241998 only
  • Corpus: 23GB
  • Number of shards: 1
  • Max Segment Size: 2GB
  • Floor Segment Size: 500MB
  • Segments Per Tier: 10
  • Merge Policy: Tiered
  • Force Merge (n=1): False
  • Prediction: Expect lots of variance, as segment sizes can range

Created the index first and updated the index.merge.policy.max_merged_segment field to 2GB, index.merge.policy.floor_segment to 500mb, and index.merge.policy.segments_per_tier to 10.

Results

Task Latency
image

Number of Segments: 12
Segment Size Distribution: Less variation and similar segment sizes

Segments from _cat API

hoangia@3c22fbd0d988 snapshots-segment-comparison % curl "<endpoint>/_cat/segments?v"
index       shard prirep ip         segment generation docs.count docs.deleted    size size.memory committed searchable version compound
logs-241998 0     p      10.0.3.181 _6c            228   13677847            0     1gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _7a            262   22057431            0   1.6gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _cb            443   15848260            0   1.1gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _ig            664   24740042            0   1.9gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _it            677   14030261            0     1gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _lt            785   13621423            0     1gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _qi            954   13818249            0     1gb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _u5           1085    7214782            0 551.7mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _vk           1136   24647064            0   1.8gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _wp           1177   15438930            0   1.1gb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _yy           1258    5539370            0   426mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _z0           1260   10829965            0 825.7mb           0 true      true       9.7.0   true

Segments from Disk

[ec2-user@ip-10-0-3-181 ~]$ ls -lrt opensearch/data/nodes/0/indices/FJVuSp17SH2-U-aFJUj-FQ/0/index
total 14372268
-rw-rw-r-- 1 ec2-user ec2-user          0 Feb 15 19:33 write.lock
-rw-rw-r-- 1 ec2-user ec2-user      95352 Feb 15 19:42 _6c.fdx
-rw-rw-r-- 1 ec2-user ec2-user  567344627 Feb 15 19:42 _6c.fdt
-rw-rw-r-- 1 ec2-user ec2-user       1210 Feb 15 19:42 _6c.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 15 19:42 _6c.nvm
-rw-rw-r-- 1 ec2-user ec2-user   13677906 Feb 15 19:42 _6c.nvd
-rw-rw-r-- 1 ec2-user ec2-user     154212 Feb 15 19:42 _7a.fdx
-rw-rw-r-- 1 ec2-user ec2-user  913161201 Feb 15 19:42 _7a.fdt
-rw-rw-r-- 1 ec2-user ec2-user       1841 Feb 15 19:42 _7a.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 15 19:42 _7a.nvm
-rw-rw-r-- 1 ec2-user ec2-user   22057490 Feb 15 19:42 _7a.nvd
-rw-rw-r-- 1 ec2-user ec2-user        464 Feb 15 19:42 _6c_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    5235759 Feb 15 19:42 _6c_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user   83943494 Feb 15 19:42 _6c_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   16931689 Feb 15 19:42 _6c_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user   63725348 Feb 15 19:42 _6c_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1017 Feb 15 19:43 _6c_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  178599538 Feb 15 19:43 _6c_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        569 Feb 15 19:43 _6c.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 15 19:43 _6c.kdm
-rw-rw-r-- 1 ec2-user ec2-user     593545 Feb 15 19:43 _6c.kdi
-rw-rw-r-- 1 ec2-user ec2-user  178567757 Feb 15 19:43 _6c.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 15 19:43 _6c.fnm
-rw-rw-r-- 1 ec2-user ec2-user        486 Feb 15 19:43 _7a_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    7706251 Feb 15 19:43 _7a_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  132826177 Feb 15 19:43 _7a_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   27164210 Feb 15 19:43 _7a_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user  102610763 Feb 15 19:43 _7a_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1017 Feb 15 19:43 _7a_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  287653653 Feb 15 19:43 _7a_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        569 Feb 15 19:44 _7a.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 15 19:44 _7a.kdm
-rw-rw-r-- 1 ec2-user ec2-user     947277 Feb 15 19:44 _7a.kdi
-rw-rw-r-- 1 ec2-user ec2-user  321640120 Feb 15 19:44 _7a.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 15 19:44 _7a.fnm
-rw-rw-r-- 1 ec2-user ec2-user     113307 Feb 15 19:46 _cb.fdx
-rw-rw-r-- 1 ec2-user ec2-user  652459705 Feb 15 19:46 _cb.fdt
-rw-rw-r-- 1 ec2-user ec2-user       1379 Feb 15 19:46 _cb.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 15 19:46 _cb.nvm
-rw-rw-r-- 1 ec2-user ec2-user   15848319 Feb 15 19:46 _cb.nvd
-rw-rw-r-- 1 ec2-user ec2-user        476 Feb 15 19:46 _cb_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    5517111 Feb 15 19:46 _cb_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user   96078117 Feb 15 19:46 _cb_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   19560508 Feb 15 19:46 _cb_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user   73974375 Feb 15 19:46 _cb_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1025 Feb 15 19:46 _cb_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  206547617 Feb 15 19:46 _cb_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        569 Feb 15 19:47 _cb.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 15 19:47 _cb.kdm
-rw-rw-r-- 1 ec2-user ec2-user     687766 Feb 15 19:47 _cb.kdi
-rw-rw-r-- 1 ec2-user ec2-user  202125339 Feb 15 19:47 _cb.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 15 19:47 _cb.fnm
-rw-rw-r-- 1 ec2-user ec2-user     174673 Feb 15 19:49 _ig.fdx
-rw-rw-r-- 1 ec2-user ec2-user 1021447701 Feb 15 19:49 _ig.fdt
-rw-rw-r-- 1 ec2-user ec2-user       2052 Feb 15 19:49 _ig.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 15 19:49 _ig.nvm
-rw-rw-r-- 1 ec2-user ec2-user   24740101 Feb 15 19:49 _ig.nvd
-rw-rw-r-- 1 ec2-user ec2-user     100597 Feb 15 19:49 _it.fdx
-rw-rw-r-- 1 ec2-user ec2-user  578100639 Feb 15 19:49 _it.fdt
-rw-rw-r-- 1 ec2-user ec2-user       1211 Feb 15 19:49 _it.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 15 19:49 _it.nvm
-rw-rw-r-- 1 ec2-user ec2-user   14030320 Feb 15 19:49 _it.nvd
-rw-rw-r-- 1 ec2-user ec2-user        490 Feb 15 19:50 _it_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    4659835 Feb 15 19:50 _it_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user   86571509 Feb 15 19:50 _it_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   17282052 Feb 15 19:50 _it_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user   64911483 Feb 15 19:50 _it_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user        482 Feb 15 19:50 _ig_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    7667424 Feb 15 19:50 _ig_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  151889403 Feb 15 19:50 _ig_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   30398096 Feb 15 19:50 _ig_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user  115021077 Feb 15 19:50 _ig_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1025 Feb 15 19:50 _it_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  183041124 Feb 15 19:50 _it_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        569 Feb 15 19:50 _it.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 15 19:50 _it.kdm
-rw-rw-r-- 1 ec2-user ec2-user     605447 Feb 15 19:50 _it.kdi
-rw-rw-r-- 1 ec2-user ec2-user  179058360 Feb 15 19:50 _it.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 15 19:50 _it.fnm
-rw-rw-r-- 1 ec2-user ec2-user       1017 Feb 15 19:50 _ig_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  322589335 Feb 15 19:50 _ig_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        569 Feb 15 19:51 _ig.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 15 19:51 _ig.kdm
-rw-rw-r-- 1 ec2-user ec2-user    1065876 Feb 15 19:51 _ig.kdi
-rw-rw-r-- 1 ec2-user ec2-user  367204632 Feb 15 19:51 _ig.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 15 19:51 _ig.fnm
-rw-rw-r-- 1 ec2-user ec2-user      98036 Feb 15 19:51 _lt.fdx
-rw-rw-r-- 1 ec2-user ec2-user  558557498 Feb 15 19:51 _lt.fdt
-rw-rw-r-- 1 ec2-user ec2-user       1211 Feb 15 19:51 _lt.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 15 19:51 _lt.nvm
-rw-rw-r-- 1 ec2-user ec2-user   13621482 Feb 15 19:51 _lt.nvd
-rw-rw-r-- 1 ec2-user ec2-user        503 Feb 15 19:52 _lt_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    4052200 Feb 15 19:52 _lt_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user   82569032 Feb 15 19:52 _lt_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   16739561 Feb 15 19:52 _lt_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user   62956980 Feb 15 19:52 _lt_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1025 Feb 15 19:52 _lt_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  177740864 Feb 15 19:52 _lt_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        569 Feb 15 19:52 _lt.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 15 19:52 _lt.kdm
-rw-rw-r-- 1 ec2-user ec2-user     585349 Feb 15 19:52 _lt.kdi
-rw-rw-r-- 1 ec2-user ec2-user  174740264 Feb 15 19:52 _lt.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 15 19:52 _lt.fnm
-rw-rw-r-- 1 ec2-user ec2-user 1106065993 Feb 15 19:55 _qi.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 15 19:55 _qi.cfe
-rw-rw-r-- 1 ec2-user ec2-user        380 Feb 15 19:55 _qi.si
-rw-rw-r-- 1 ec2-user ec2-user  578543397 Feb 15 19:57 _u5.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 15 19:57 _u5.cfe
-rw-rw-r-- 1 ec2-user ec2-user        380 Feb 15 19:57 _u5.si
-rw-rw-r-- 1 ec2-user ec2-user     176532 Feb 15 19:57 _vk.fdx
-rw-rw-r-- 1 ec2-user ec2-user 1012300550 Feb 15 19:57 _vk.fdt
-rw-rw-r-- 1 ec2-user ec2-user       2052 Feb 15 19:57 _vk.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 15 19:57 _vk.nvm
-rw-rw-r-- 1 ec2-user ec2-user   24647123 Feb 15 19:57 _vk.nvd
-rw-rw-r-- 1 ec2-user ec2-user        478 Feb 15 19:58 _vk_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    7272835 Feb 15 19:58 _vk_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  148786922 Feb 15 19:58 _vk_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   30133403 Feb 15 19:58 _vk_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user  113727601 Feb 15 19:58 _vk_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1025 Feb 15 19:58 _vk_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  321318821 Feb 15 19:58 _vk_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        569 Feb 15 19:59 _vk.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 15 19:59 _vk.kdm
-rw-rw-r-- 1 ec2-user ec2-user    1051049 Feb 15 19:59 _vk.kdi
-rw-rw-r-- 1 ec2-user ec2-user  361414048 Feb 15 19:59 _vk.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 15 19:59 _vk.fnm
-rw-rw-r-- 1 ec2-user ec2-user 1239034860 Feb 15 19:59 _wp.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 15 19:59 _wp.cfe
-rw-rw-r-- 1 ec2-user ec2-user        380 Feb 15 19:59 _wp.si
-rw-rw-r-- 1 ec2-user ec2-user  446733275 Feb 15 19:59 _yy.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 15 19:59 _yy.cfe
-rw-rw-r-- 1 ec2-user ec2-user        380 Feb 15 19:59 _yy.si
-rw-rw-r-- 1 ec2-user ec2-user  865848573 Feb 15 20:00 _z0.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 15 20:00 _z0.cfe
-rw-rw-r-- 1 ec2-user ec2-user        380 Feb 15 20:00 _z0.si
-rw-rw-r-- 1 ec2-user ec2-user       1253 Feb 15 20:04 segments_24

Segment Timestamps

[ec2-user@ip-10-0-3-181 lucene-university]$ ./gradlew run -PclassToExecute=example.points.VisualizePointTree

> Task :run
Tree for segment 0
0 [897249601000,897765454000] - 22057431
Tree for segment 1
0 [897249622000,897764854000] - 13677847
Tree for segment 2
0 [897340417000,897779787000] - 15848260
Tree for segment 3
0 [897340640000,897801197000] - 24740042
Tree for segment 4
0 [897365216000,897802923000] - 14030261
Tree for segment 5
0 [897380501000,897812949000] - 13621423
Tree for segment 6
0 [897386046000,897845341000] - 24647064
Tree for segment 7
0 [897391777000,897837369000] - 13818249
Tree for segment 8
0 [897402835000,897843158000] - 7214782
Tree for segment 9
0 [897405851000,897848475000] - 15438930
Tree for segment 10
0 [897412701000,897854400000] - 10829965
Tree for segment 11
0 [897415011000,897853688000] - 5539370
Finished printing segments

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.5/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 869ms
2 actionable tasks: 1 executed, 1 up-to-date

@IanHoang
Copy link
Collaborator Author

Converted Segment Timestamps

$ ./gradlew run -PclassToExecute=example.points.VisualizePointTree

> Task :run
Tree for segment 0
0 [July 7, 1998 20:00:01, July 13, 1998 19:17:34] - 22057431
Tree for segment 1
0 [July 7, 1998 20:00:22, July 13, 1998 19:07:34] - 13677847
Tree for segment 2
0 [July 8, 1998 21:13:37, July 13, 1998 23:16:27] - 15848260
Tree for segment 3
0 [July 8, 1998 21:17:20, July 14, 1998 05:13:17] - 24740042
Tree for segment 4
0 [July 9, 1998 04:06:56, July 14, 1998 05:42:03] - 14030261
Tree for segment 5
0 [July 9, 1998 08:21:41, July 14, 1998 08:29:09] - 13621423
Tree for segment 6
0 [July 9, 1998 09:54:06, July 14, 1998 17:29:01] - 24647064
Tree for segment 7
0 [July 9, 1998 11:29:37, July 14, 1998 15:16:09] - 13818249
Tree for segment 8
0 [July 9, 1998 14:33:55, July 14, 1998 16:52:38] - 7214782
Tree for segment 9
0 [July 9, 1998 15:24:11, July 14, 1998 18:21:15] - 15438930
Tree for segment 10
0 [July 9, 1998 17:18:21, July 14, 1998 20:00:00] - 10829965
Tree for segment 11
0 [July 9, 1998 17:56:51, July 14, 1998 19:48:08] - 5539370
Finished printing segments

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.5/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 869ms
2 actionable tasks: 1 executed, 1 up-to-date

@msfroh
Copy link

msfroh commented Feb 16, 2024

@IanHoang -- Can you please update the units for the floor segment size to MB in each of your comments above? A floor segment size of 500GB would merge really aggressively. (I hope it was 500MB.)

Also, this last run makes me very curious to see how the log_byte_size merge policy experiments will do. While the segments are well-balanced by size and doc count, notice how they all span 5-6 days' worth of data (where the whole data set runs for a week). We're likely not seeing a lot of opportunity for segment skipping.

Ideally, we will be able to set parameters so that we end up with segments that have less overlap (so each segment covers approximately half a day's worth of docs).

@IanHoang
Copy link
Collaborator Author

IanHoang commented Feb 16, 2024

@msfroh I have updated the comments above to show the floor segments in MB. Regarding the latest run, it was a typo and was meant to show MB. I captured the following in my worklog when I curled the settings before I started the run

 "merge" : {
          "policy" : {
            "floor_segment" : "500mb",
            "max_merged_segment" : "2gb"
          }
        },

Also, could you elaborate by what you mean by "segment skipping"?

@IanHoang
Copy link
Collaborator Author

IanHoang commented Feb 19, 2024

Segment Tuning Experiment 5

  • Workload: Http logs with index logs-241998 only
  • Corpus: 23GB
  • Number of shards: 1
  • Max Segment Size: 2GB
  • Floor Segment Size: 500MB
  • Segments Per Tier: 5
  • Merge Policy: Tiered
  • Force Merge (n=1): False
  • Prediction: Expect lots of variance, as segment sizes can range

Created the index first and updated the index.merge.policy.max_merged_segment field to 2GB, index.merge.policy.floor_segment to 500mb, and index.merge.policy.segments_per_tier to 5.

Results

Task Latency
image

Number of Segments: 11
Segment Size Distribution: Less variation and similar segment sizes

Segments from _cat API

hoangia@3c22fbd0d988 snapshots-segment-comparison % curl "<endpoint>/_cat/segments?v"
index       shard prirep ip         segment generation docs.count docs.deleted    size size.memory committed searchable version compound
logs-241998 0     p      10.0.3.181 _6z            251   19678856            0   1.5gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _b5            401   19781912            0   1.5gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _fd            553   26146941            0     2gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _kw            752   14688764            0     1gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _oz            899   26809218            0     2gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _vf           1131   21458757            0   1.6gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _y0           1224   17782791            0   1.3gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _13a          1414    3027109            0 229.9mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _13p          1429   25742546            0   1.9gb           0 true      true       9.7.0   false
logs-241998 0     p      10.0.3.181 _14s          1468    3646838            0   274mb           0 true      true       9.7.0   true
logs-241998 0     p      10.0.3.181 _14u          1470    2699892            0 199.9mb           0 true      true       9.7.0   true

Segments from Disk

[ec2-user@ip-10-0-3-181 index]$ ls -lrt
total 14457452
-rw-rw-r-- 1 ec2-user ec2-user          0 Feb 16 18:32 write.lock
-rw-rw-r-- 1 ec2-user ec2-user     139563 Feb 16 18:41 _6z.fdx
-rw-rw-r-- 1 ec2-user ec2-user  814731346 Feb 16 18:41 _6z.fdt
-rw-rw-r-- 1 ec2-user ec2-user       1673 Feb 16 18:41 _6z.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 16 18:41 _6z.nvm
-rw-rw-r-- 1 ec2-user ec2-user   19678915 Feb 16 18:41 _6z.nvd
-rw-rw-r-- 1 ec2-user ec2-user        474 Feb 16 18:41 _6z_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    6954790 Feb 16 18:41 _6z_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  118498982 Feb 16 18:41 _6z_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   24242971 Feb 16 18:41 _6z_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user   91439600 Feb 16 18:41 _6z_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1017 Feb 16 18:42 _6z_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  256440134 Feb 16 18:42 _6z_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        568 Feb 16 18:42 _6z.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 16 18:42 _6z.kdm
-rw-rw-r-- 1 ec2-user ec2-user     843863 Feb 16 18:42 _6z.kdi
-rw-rw-r-- 1 ec2-user ec2-user  286230858 Feb 16 18:42 _6z.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 16 18:42 _6z.fnm
-rw-rw-r-- 1 ec2-user ec2-user     141584 Feb 16 18:43 _b5.fdx
-rw-rw-r-- 1 ec2-user ec2-user  813642622 Feb 16 18:43 _b5.fdt
-rw-rw-r-- 1 ec2-user ec2-user       1673 Feb 16 18:43 _b5.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 16 18:43 _b5.nvm
-rw-rw-r-- 1 ec2-user ec2-user   19781971 Feb 16 18:43 _b5.nvd
-rw-rw-r-- 1 ec2-user ec2-user        464 Feb 16 18:44 _b5_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    5916439 Feb 16 18:44 _b5_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  119460832 Feb 16 18:44 _b5_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   24402549 Feb 16 18:44 _b5_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user   92226188 Feb 16 18:44 _b5_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1025 Feb 16 18:44 _b5_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  258124142 Feb 16 18:44 _b5_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        568 Feb 16 18:45 _b5.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 16 18:45 _b5.kdm
-rw-rw-r-- 1 ec2-user ec2-user     854107 Feb 16 18:45 _b5.kdi
-rw-rw-r-- 1 ec2-user ec2-user  286826272 Feb 16 18:45 _b5.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 16 18:45 _b5.fnm
-rw-rw-r-- 1 ec2-user ec2-user     188857 Feb 16 18:46 _fd.fdx
-rw-rw-r-- 1 ec2-user ec2-user 1074651219 Feb 16 18:46 _fd.fdt
-rw-rw-r-- 1 ec2-user ec2-user       2178 Feb 16 18:46 _fd.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 16 18:46 _fd.nvm
-rw-rw-r-- 1 ec2-user ec2-user   26147000 Feb 16 18:46 _fd.nvd
-rw-rw-r-- 1 ec2-user ec2-user        470 Feb 16 18:46 _fd_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    7778778 Feb 16 18:46 _fd_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  159348239 Feb 16 18:46 _fd_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   32164037 Feb 16 18:46 _fd_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user  122002961 Feb 16 18:46 _fd_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1017 Feb 16 18:47 _fd_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  340592172 Feb 16 18:47 _fd_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        568 Feb 16 18:47 _fd.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 16 18:47 _fd.kdm
-rw-rw-r-- 1 ec2-user ec2-user    1125848 Feb 16 18:47 _fd.kdi
-rw-rw-r-- 1 ec2-user ec2-user  398643821 Feb 16 18:47 _fd.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 16 18:47 _fd.fnm
-rw-rw-r-- 1 ec2-user ec2-user     103479 Feb 16 18:49 _kw.fdx
-rw-rw-r-- 1 ec2-user ec2-user  601886966 Feb 16 18:49 _kw.fdt
-rw-rw-r-- 1 ec2-user ec2-user       1253 Feb 16 18:49 _kw.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 16 18:49 _kw.nvm
-rw-rw-r-- 1 ec2-user ec2-user   14688823 Feb 16 18:49 _kw.nvd
-rw-rw-r-- 1 ec2-user ec2-user        470 Feb 16 18:49 _kw_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    4775342 Feb 16 18:49 _kw_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user   91324072 Feb 16 18:49 _kw_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   18090510 Feb 16 18:49 _kw_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user   67911023 Feb 16 18:49 _kw_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1017 Feb 16 18:49 _kw_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  191683389 Feb 16 18:49 _kw_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        568 Feb 16 18:50 _kw.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 16 18:50 _kw.kdm
-rw-rw-r-- 1 ec2-user ec2-user     634907 Feb 16 18:50 _kw.kdi
-rw-rw-r-- 1 ec2-user ec2-user  188161032 Feb 16 18:50 _kw.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 16 18:50 _kw.fnm
-rw-rw-r-- 1 ec2-user ec2-user     195204 Feb 16 18:51 _oz.fdx
-rw-rw-r-- 1 ec2-user ec2-user 1096460395 Feb 16 18:51 _oz.fdt
-rw-rw-r-- 1 ec2-user ec2-user       2220 Feb 16 18:51 _oz.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 16 18:51 _oz.nvm
-rw-rw-r-- 1 ec2-user ec2-user   26809277 Feb 16 18:51 _oz.nvd
-rw-rw-r-- 1 ec2-user ec2-user        470 Feb 16 18:52 _oz_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    8550933 Feb 16 18:52 _oz_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  162443895 Feb 16 18:52 _oz_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   32828928 Feb 16 18:52 _oz_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user  124045525 Feb 16 18:52 _oz_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1025 Feb 16 18:52 _oz_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  349241832 Feb 16 18:52 _oz_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        568 Feb 16 18:53 _oz.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 16 18:53 _oz.kdm
-rw-rw-r-- 1 ec2-user ec2-user    1148529 Feb 16 18:53 _oz.kdi
-rw-rw-r-- 1 ec2-user ec2-user  392086832 Feb 16 18:53 _oz.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 16 18:53 _oz.fnm
-rw-rw-r-- 1 ec2-user ec2-user     155344 Feb 16 18:54 _vf.fdx
-rw-rw-r-- 1 ec2-user ec2-user  875548453 Feb 16 18:54 _vf.fdt
-rw-rw-r-- 1 ec2-user ec2-user       1799 Feb 16 18:54 _vf.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 16 18:54 _vf.nvm
-rw-rw-r-- 1 ec2-user ec2-user   21458816 Feb 16 18:54 _vf.nvd
-rw-rw-r-- 1 ec2-user ec2-user        472 Feb 16 18:55 _vf_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    6884834 Feb 16 18:55 _vf_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  129353099 Feb 16 18:55 _vf_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   26256339 Feb 16 18:55 _vf_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user   99137207 Feb 16 18:55 _vf_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1025 Feb 16 18:55 _vf_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  279456909 Feb 16 18:55 _vf_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        568 Feb 16 18:56 _vf.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 16 18:56 _vf.kdm
-rw-rw-r-- 1 ec2-user ec2-user     918663 Feb 16 18:56 _vf.kdi
-rw-rw-r-- 1 ec2-user ec2-user  306007813 Feb 16 18:56 _vf.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 16 18:56 _vf.fnm
-rw-rw-r-- 1 ec2-user ec2-user     127832 Feb 16 18:56 _y0.fdx
-rw-rw-r-- 1 ec2-user ec2-user  725337868 Feb 16 18:56 _y0.fdt
-rw-rw-r-- 1 ec2-user ec2-user       1505 Feb 16 18:56 _y0.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 16 18:56 _y0.nvm
-rw-rw-r-- 1 ec2-user ec2-user   17782850 Feb 16 18:56 _y0.nvd
-rw-rw-r-- 1 ec2-user ec2-user        458 Feb 16 18:56 _y0_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    6145553 Feb 16 18:56 _y0_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  108344544 Feb 16 18:56 _y0_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   21770351 Feb 16 18:56 _y0_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user   81949464 Feb 16 18:56 _y0_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user       1025 Feb 16 18:57 _y0_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  231880827 Feb 16 18:57 _y0_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user        568 Feb 16 18:57 _y0.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 16 18:57 _y0.kdm
-rw-rw-r-- 1 ec2-user ec2-user     765286 Feb 16 18:57 _y0.kdi
-rw-rw-r-- 1 ec2-user ec2-user  245211490 Feb 16 18:57 _y0.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 16 18:57 _y0.fnm
-rw-rw-r-- 1 ec2-user ec2-user     185130 Feb 16 18:59 _13p.fdx
-rw-rw-r-- 1 ec2-user ec2-user 1050643517 Feb 16 18:59 _13p.fdt
-rw-rw-r-- 1 ec2-user ec2-user       2136 Feb 16 18:59 _13p.fdm
-rw-rw-r-- 1 ec2-user ec2-user        103 Feb 16 18:59 _13p.nvm
-rw-rw-r-- 1 ec2-user ec2-user   25742605 Feb 16 18:59 _13p.nvd
-rw-rw-r-- 1 ec2-user ec2-user  241146307 Feb 16 18:59 _13a.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 16 18:59 _13a.cfe
-rw-rw-r-- 1 ec2-user ec2-user        382 Feb 16 18:59 _13a.si
-rw-rw-r-- 1 ec2-user ec2-user        497 Feb 16 18:59 _13p_Lucene90_0.tmd
-rw-rw-r-- 1 ec2-user ec2-user    7659654 Feb 16 18:59 _13p_Lucene90_0.tip
-rw-rw-r-- 1 ec2-user ec2-user  155147881 Feb 16 18:59 _13p_Lucene90_0.tim
-rw-rw-r-- 1 ec2-user ec2-user   31506710 Feb 16 18:59 _13p_Lucene90_0.pos
-rw-rw-r-- 1 ec2-user ec2-user  118899270 Feb 16 18:59 _13p_Lucene90_0.doc
-rw-rw-r-- 1 ec2-user ec2-user  287367082 Feb 16 18:59 _14s.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 16 18:59 _14s.cfe
-rw-rw-r-- 1 ec2-user ec2-user        382 Feb 16 18:59 _14s.si
-rw-rw-r-- 1 ec2-user ec2-user       1025 Feb 16 19:00 _13p_Lucene90_0.dvm
-rw-rw-r-- 1 ec2-user ec2-user  335230442 Feb 16 19:00 _13p_Lucene90_0.dvd
-rw-rw-r-- 1 ec2-user ec2-user  209648917 Feb 16 19:00 _14u.cfs
-rw-rw-r-- 1 ec2-user ec2-user        479 Feb 16 19:00 _14u.cfe
-rw-rw-r-- 1 ec2-user ec2-user        382 Feb 16 19:00 _14u.si
-rw-rw-r-- 1 ec2-user ec2-user        585 Feb 16 19:00 _13p.si
-rw-rw-r-- 1 ec2-user ec2-user        421 Feb 16 19:00 _13p.kdm
-rw-rw-r-- 1 ec2-user ec2-user    1098310 Feb 16 19:00 _13p.kdi
-rw-rw-r-- 1 ec2-user ec2-user  378930541 Feb 16 19:00 _13p.kdd
-rw-rw-r-- 1 ec2-user ec2-user       1150 Feb 16 19:00 _13p.fnm
-rw-rw-r-- 1 ec2-user ec2-user       1174 Feb 16 19:05 segments_25

Segment Timestamps

> Task :run
Tree for segment 0
0 [897249601000,897763192000] - 19678856
Tree for segment 1
0 [897311214000,897772261000] - 19781912
Tree for segment 2
0 [897338738000,897784421000] - 26146941
Tree for segment 3
0 [897364689000,897814746000] - 26809218
Tree for segment 4
0 [897370035000,897802143000] - 14688764
Tree for segment 5
0 [897389416000,897838827000] - 21458757
Tree for segment 6
0 [897392684000,897842951000] - 17782791
Tree for segment 7
0 [897402979000,897852533000] - 25742546
Tree for segment 8
0 [897413036000,897852533000] - 3027109
Tree for segment 9
0 [897415693000,897854400000] - 3646838
Tree for segment 10
0 [897416726000,897854400000] - 2699892
Finished printing segments

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.5/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 873ms
2 actionable tasks: 1 executed, 1 up-to-date

@IanHoang
Copy link
Collaborator Author

Converted Timestamps for Segment Tuning Experiment 5

> Task :run
Tree for segment 0
0 [July 7, 1998 20:00:01, July 13, 1998 18:39:52] - 19678856
Tree for segment 1
0 [July 8, 1998 13:06:54, July 13, 1998 21:11:01] - 19781912
Tree for segment 2
0 [July 8, 1998 20:45:38, July 14, 1998 00:33:41] - 26146941
Tree for segment 3
0 [July 9, 1998 03:58:09, July 14, 1998 08:59:06] - 26809218
Tree for segment 4
0 [July 9, 1998 05:27:15, July 14, 1998 05:29:03] - 14688764
Tree for segment 5
0 [July 9, 1998 10:50:16, July 14, 1998 15:40:27] - 21458757
Tree for segment 6
0 [July 9, 1998 11:44:44, July 14, 1998 16:49:11] - 17782791
Tree for segment 7
0 [July 9, 1998 14:36:19, July 14, 1998 19:28:53] - 25742546
Tree for segment 8
0 [July 9, 1998 17:23:56, July 14, 1998 19:28:53] - 3027109
Tree for segment 9
0 [July 9, 1998 18:08:13, July 14, 1998 20:00:00] - 3646838
Tree for segment 10
0 [July 9, 1998 18:25:26, July 14, 1998 20:00:00] - 2699892
Finished printing segments

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.5/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 873ms
2 actionable tasks: 1 executed, 1 up-to-date

@IanHoang
Copy link
Collaborator Author

Tiered Merge Policy Segment Tuning Experiment 1

  • Workload: Http logs with index logs-241998 only
  • Corpus: 23GB
  • Number of shards: 1
  • Max Segment Size: 5GB
  • Floor Segment Size: 2MB
  • Segments Per Tier: 5
  • Merge Policy: tiered
  • Force Merge (n=1): True
  • Prediction: Expect lots of variance, as segment sizes can range

Results

Task Latency
http_logs_5gb_2mb_t_t_2024_02_23_07_53_PM.csv

Number of Segments: 1
Segment Size Distribution: little variance, just one segment

Segments from _cat API

$ curl "<endpoint>/_cat/segments?v"
index              shard prirep ip         segment generation docs.count docs.deleted   size size.memory committed searchable version compound
.plugins-ml-config 0     p      10.0.3.199 _0               0          1            0  3.6kb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.199 _12w          1400  181463624            0 14.2gb           0 true      true       9.7.0   false

Segment Timestamps

> Task :run
Tree for segment 0
0 [July 7, 1998 20:00:01, July 14, 1998 20:00:00] - 181463624
Finished printing segments

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.5/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 875ms
2 actionable tasks: 1 executed, 1 up-to-date

@IanHoang
Copy link
Collaborator Author

IanHoang commented Feb 26, 2024

Log Byte Size Merge Policy Segment Tuning Experiment 1

  • Workload: Http logs with index logs-241998 only
  • Corpus: 23GB
  • Number of shards: 1
  • Max Segment Size: 5GB
  • Floor Segment Size: 2MB
  • Segments Per Tier: 5
  • Merge Policy: log_byte_size
  • Force Merge (n=1): False
  • Prediction: Expect lots of variance, as segment sizes can range
    logs-241998 index settings
"merge" : {
  "policy" : "log_byte_size"
},

Results

Task Latency
http_logs_5gb_2mb_l_f_2024_02_23_06_43_PM.csv

Number of Segments: 24
Segment Size Distribution: lots of variation since segment sizes have wide range

Segments from _cat API

$ curl "<endpoint>/_cat/segments?v"
index              shard prirep ip         segment generation docs.count docs.deleted    size size.memory committed searchable version compound
.plugins-ml-config 0     p      10.0.3.233 _0               0          1            0   3.6kb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _hi            630   11861591            0 915.2mb           0 true      true       9.7.0   false
logs-241998        0     p      10.0.3.233 _kg            736   15046075            0   1.1gb           0 true      true       9.7.0   false
logs-241998        0     p      10.0.3.233 _mc            804    9259183            0 714.1mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _nk            848    7013506            0 538.9mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _pa            910    7371245            0 564.6mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _sr           1035   16574237            0   1.2gb           0 true      true       9.7.0   false
logs-241998        0     p      10.0.3.233 _vh           1133   57267168            0   4.4gb           0 true      true       9.7.0   false
logs-241998        0     p      10.0.3.233 _w9           1161   16202883            0   1.2gb           0 true      true       9.7.0   false
logs-241998        0     p      10.0.3.233 _xh           1205    6329912            0 486.5mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _yy           1258    6365644            0   489mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _116          1338    9248935            0 706.5mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _12p          1393    8813601            0 678.8mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _145          1445    1591690            0 122.2mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _146          1446    5880102            0 450.6mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _147          1447     246552            0  17.1mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _148          1448     143700            0  10.4mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _149          1449      47956            0   3.7mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _14b          1451      59887            0   4.6mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _14c          1452      36926            0     3mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _14d          1453     144413            0   9.6mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _14e          1454     272982            0  17.7mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _14f          1455    1665443            0 126.8mb           0 true      true       9.7.0   true
logs-241998        0     p      10.0.3.233 _14g          1456      19993            0   1.4mb           0 true      true       9.7.0   true

Segment Timestamps

> Task :run
Tree for segment 0
0 [July 7, 1998 20:00:01, July 13, 1998 22:16:30] - 57267168
Tree for segment 1
0 [July 9, 1998 02:03:07, July 14, 1998 00:04:16] - 11861591
Tree for segment 2
0 [July 9, 1998 05:06:46, July 14, 1998 02:35:55] - 15046075
Tree for segment 3
0 [July 9, 1998 08:15:07, July 14, 1998 04:35:06] - 9259183
Tree for segment 4
0 [July 9, 1998 09:30:54, July 14, 1998 06:06:37] - 7013506
Tree for segment 5
0 [July 9, 1998 10:26:05, July 14, 1998 07:21:28] - 7371245
Tree for segment 6
0 [July 9, 1998 11:17:30, July 14, 1998 10:16:14] - 16574237
Tree for segment 7
0 [July 9, 1998 13:04:09, July 14, 1998 15:25:43] - 16202883
Tree for segment 8
0 [July 9, 1998 14:43:16, July 14, 1998 16:08:23] - 6329912
Tree for segment 9
0 [July 9, 1998 15:21:57, July 14, 1998 16:42:35] - 6365644
Tree for segment 10
0 [July 9, 1998 15:53:00, July 14, 1998 17:57:55] - 9248935
Tree for segment 11
0 [July 9, 1998 17:03:23, July 14, 1998 18:54:07] - 8813601
Tree for segment 12
0 [July 9, 1998 17:55:01, July 14, 1998 19:34:01] - 5880102
Tree for segment 13
0 [July 9, 1998 18:37:33, July 14, 1998 19:37:16] - 1591690
Tree for segment 14
0 [July 9, 1998 18:56:08, July 14, 1998 19:57:27] - 1665443
Tree for segment 15
0 [July 9, 1998 19:12:53, July 14, 1998 19:57:13] - 36926
Tree for segment 16
0 [July 9, 1998 19:12:22, July 14, 1998 19:57:48] - 59887
Tree for segment 17
0 [July 9, 1998 19:12:22, July 14, 1998 19:57:27] - 246552
Tree for segment 18
0 [July 9, 1998 19:12:22, July 14, 1998 19:57:00] - 47956
Tree for segment 19
0 [July 9, 1998 19:12:22, July 14, 1998 19:57:27] - 143700
Tree for segment 20
0 [July 12, 1998 02:11:38, July 14, 1998 19:59:58] - 19993
Tree for segment 21
0 [July 12, 1998 02:09:43, July 14, 1998 20:00:00] - 272982
Tree for segment 22
0 [July 12, 1998 02:09:43, July 14, 1998 20:00:00] - 144413
Finished printing segments

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.5/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 992ms

@msfroh
Copy link

msfroh commented Mar 5, 2024

@IanHoang -- Do you have multiple clients running? Does the test run shuffle documents?

For log_byte_size, we would expect the date ranges for each segment to be contiguous and non-overlapping.

With each segment containing docs from Jul 13 or 14, it makes it sound like later documents are being added throughout the run.

@IanHoang
Copy link
Collaborator Author

IanHoang commented Apr 9, 2024

Had an offline discussion with @msfroh, @rishabhmaurya, and @gkamat. This will be the plan going forward:
There are 8 testing scenarios as follows. We will run 3 runs for each testing scenario (with independent ingestion and search) and check if variance is < 5% between all runs for a scenario.
Goal of these testing scenarios: Do we see variance in latency because of variance in segment size?
Will be analyizing the following:

  • Timestamps
  • Service Time
  • Variance between different scenarios

Testing Scenarios

No. Workload Number of Shards Max Segment Size (GB) Floor Segment (MB) Merge Policy Force Merge (max_segments=1) Segments Per Tier Cluster Version
1 http_logs - 25GB 1 5GB (default) 2MB (default) tiered TRUE 10 OS 2.11
2 http_logs - 25GB 1 5GB (default) 2MB (default) tiered FALSE 10 OS 2.11
3 http_logs - 25GB 1 2GB 2MB (default) tiered FALSE 10 OS 2.11
4 http_logs - 25GB 1 2GB 500MB tiered FALSE 10 OS 2.11
5 http_logs - 25GB 1 2GB 500MB tiered FALSE 5 OS 2.11
6 http_logs - 25GB 1 5GB (default) 2MB (default) log_byte_size FALSE 10 OS 2.11
7 http_logs - 25GB 1 2GB 2MB (default) log_byte_size FALSE 10 OS 2.11
8 http_logs - 25GB 1 2GB 500MB log_byte_size FALSE 10 OS 2.11

Will be conducting these experiments first and paste the results in a google spreadsheet here afterwards. This will make analysis easier and prevent pollution of this issue.

@IanHoang
Copy link
Collaborator Author

IanHoang commented Apr 23, 2024

Segment Tuning Analysis Results

Purpose

These tests were conducted to determine if variance seen in query service times in http_logs nightly runs is due to variance in Lucene segment sizes.

Test Setup

Eight OpenSearch 2.11 clusters were provisioned. An EC2 instance was provisioned for each of these eight clusters. Each EC2 instance contains the necessary scripts to configure the clusters based on four categories -- Max Segment Size (GB), Floor Segment Size (MB), Merge Policy, and Segments Per Tier -- and trigger a series of http_logs tests. These scripts collect the P90 results from all runs, averaged them, and calculated the relative standard deviation (RSD) for each query.

While RSD provides us a percentage that represents the variance for each scenario, we must also plot the raw data by query to better visualize the variance. These results are shown in the next section.

Results

Average Scenario Results and Relative Standard Deviation (RSD)

image
Scenario 2 is the default segment configurations that come with OpenSearch today.

Raw Runs Plotted by Query

Notes:
Scenario 1 is omitted from the diagrams below and is plotted with constant values because it had very high service time values which made it difficult to view the plot points for other scenarios. Scenario 1 used force-merge to reduce segments to 1 segment, which is why the results are vastly different from the other scenarios. The rest of the scenarios did not use force-merge and their raw service times are shown as follows.

desc_sort_size
asc_sort_size
desc_sort_timestamp
asc_sort_timestamp
desc_sort_with_after_timestamp
asc_sort_with_after_timestamp

Conclusions

The conducted experiments demonstrate that modifications to four categories -- Max Segment Size (GB), Floor Segment Size (MB), Merge Policy, and Segments Per Tier -- that influence segment distribution and sizes certainly impact query service time and variance.

Based on the results observed above, the results do not highlight a superior scenario. It can be inferred that modifying these four categories can potentially improve the performance for specific sort queries when compared with the default configurations (Scenario 2). However, despite this performance improvement for some sort queries, variance still persists.

For folks who are interested in recreating the testing apparatus and retrieving their own results, they are welcome to follow the steps and scripts provided in this public repository. The README has all the required steps.

For future experiments, it would be interesting to see segment tuning analysis be done with other workloads like the new big5 workload.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Performance Roadmap Apr 23, 2024
@msfroh
Copy link

msfroh commented May 21, 2024

Thanks @IanHoang -- If we squint, it looks like scenario 5 is "sometimes" better than scenario 2, but it's not consistent enough that I would recommend that we change the default.

We do still have opensearch-project/OpenSearch#7160 as an open issue to explore lowering the max segment size to better-balance segment sizes, but that's focused on improving performance of concurrent segment search (which I don't think was covered by these experiments -- and concurrent segment search has interesting side-effects on inter-segment dynamic pruning).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request High Priority RFC Request for comment on major changes
Projects
Status: Done
Development

No branches or pull requests

2 participants