ES性能优化 #5

v5tech · 2021-08-26T03:17:37Z

I'm using elasticsearch 1.5

and it is working perfectly the most part of the time, but everyday at the same time it becomes crazy, CPU % goes to ~70% when the average is around 3-5% there are SUPER servers with 32GB reserved for lucene, swap it is lock and clearing the cache doesn't solve the problem (it doesn't take down the heap mem)

Settings:

3 servers (nodes) 32 cores and 128GB RAM each
2 buckets (indices) one with ~18 million documents (this one doesn't receive updates pretty often just indexing new docs) the other one have around 7-8 million documents but we are constantly bombarding it with updates search delete and indexing as well

The best distribution for our structure, was to have only 1 shard per node with not replicas, we can afford to have a % of the data off for few seconds, that will be back as soon as the server get online again, and this process is fast enough since it doesn't need to relocate anything. previously we used to have 3 shards with 1 replica, but the issue mentioned above occurs as well, so is easy to figure it out that the problem is not related with the distribution.

Things that I already tried,

Merging, i try to use the Optimize API trying to give less load to the schedule merge, but actually the merging process takes a lot of R/W of the disk but it doesn't affect substantially the mem or the CPU load.

Flushing, I tried to flush with long and shot intervals, and the results were the same nothing

changed, since flushing affects directly the merging process and as mentioned above, merging process doesn't takes that much of the CPU or mem usage.

managing the cache, clearing it manually but it doesn't seems to take the cpu load to normal state not even for a moment.

Here is the most of the elasticsearch.yml configs

Force all memory to be locked, forcing the JVM to never swap

bootstrap.mlockall: true

Threadpool Settings

Search pool

threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 200

Bulk pool

threadpool.bulk.type: fixed
threadpool.bulk.size: 60
threadpool.bulk.queue_size: 3000

Index pool

threadpool.index.type: fixed
threadpool.index.size: 20
threadpool.index.queue_size: 1000

Indices settings

indices.memory.index_buffer_size: 30%
indices.memory.min_shard_index_buffer_size: 12mb
indices.memory.min_index_buffer_size: 96mb

Cache Sizes

indices.fielddata.cache.size: 30%
#indices.fielddata.cache.expire: 6h #will be depreciated & Dev recomend not to use it
indices.cache.filter.size: 30%
#indices.cache.filter.expire: 6h #will be depreciated & Dev recomend not to use it

Indexing Settings for Writes

index.refresh_interval: 30s
#index.translog.flush_threshold_ops: 50000
#index.translog.flush_threshold_size: 1024mb
index.translog.flush_threshold_period: 5m
index.merge.scheduler.max_thread_count: 1

here is the stats when the server is in a normal state:
node_stats_normal.txt

Node stats during the problem.
node_stats.txt

I will appreciate any help or discussion that can point me in the right direction to get rid of this behavior

thanks in advance..

Regards,

Daniel

Originally posted by @ACV2 in elastic/elasticsearch#4288 (comment)

v5tech · 2021-11-25T10:18:47Z

https://octoperf.com/blog/2018/09/21/optimizing-elasticsearch/

v5tech · 2021-11-25T10:19:17Z

https://aws.amazon.com/cn/premiumsupport/knowledge-center/opensearch-indexing-performance/

v5tech · 2021-11-25T10:20:21Z

https://github.com/garyelephant/blog/blob/master/elasticsearch_optimization_checklist.md

v5tech · 2021-11-25T10:24:16Z

https://aws.amazon.com/cn/premiumsupport/knowledge-center/opensearch-troubleshoot-high-cpu/

v5tech · 2021-11-25T10:26:24Z

https://aws.amazon.com/cn/premiumsupport/knowledge-center/opensearch-red-yellow-status/

v5tech · 2021-11-25T10:27:29Z

https://aws.amazon.com/cn/premiumsupport/knowledge-center/opensearch-resolve-429-error/

v5tech · 2021-11-25T10:27:48Z

https://aws.amazon.com/cn/premiumsupport/knowledge-center/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES性能优化 #5

ES性能优化 #5

v5tech commented Aug 26, 2021

v5tech commented Nov 25, 2021

v5tech commented Nov 25, 2021

v5tech commented Nov 25, 2021

v5tech commented Nov 25, 2021

v5tech commented Nov 25, 2021

v5tech commented Nov 25, 2021

v5tech commented Nov 25, 2021

ES性能优化 #5

ES性能优化 #5

Comments

v5tech commented Aug 26, 2021

Force all memory to be locked, forcing the JVM to never swap

Threadpool Settings

Search pool

Bulk pool

Index pool

Indices settings

Cache Sizes

Indexing Settings for Writes

v5tech commented Nov 25, 2021

v5tech commented Nov 25, 2021

v5tech commented Nov 25, 2021

v5tech commented Nov 25, 2021

v5tech commented Nov 25, 2021

v5tech commented Nov 25, 2021

v5tech commented Nov 25, 2021