Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Circuit breaker exceptions due to misconfigured fielddata cache size and circuit breaker for fieldcache #12475

Open
vikasvb90 opened this issue Feb 27, 2024 · 1 comment
Labels
bug Something isn't working Search:Resiliency Search Search query, autocomplete ...etc

Comments

@vikasvb90
Copy link
Contributor

vikasvb90 commented Feb 27, 2024

Describe the bug

We allow users to configure setting indices.breaker.fielddata.limit lesser than indices.fielddata.cache.size. If this happens and if fielddata cache is enabled on one or more fields then it is possible for fielddata cache to grow beyond fielddata breaker limit. This can happen if there is a sudden burst of heavy search queries which can fill up the cache with more field data than CB limit before circuit breaker starts kicking in. Due to this, subsequent search queries or aggregations on fielddata cache enabled fields will start failing with circuit breaker exceptions.

[2024-02-25T09:00:44,638][DEBUG][o.o.a.s.TransportSearchAction] [f92acfa0c58f3643980f1cada9df945d] [l3F63B3JR0KY7qbJ5cyJAg][.opendistro-ism-config][1]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[.opendistro-ism-config], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], routing='null', preference='_shards:1|_primary', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=null, allowPartialSearchResults=true, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"size":100,"query":{"match_all":{"boost":1.0}},"version":true,"seq_no_primary_term":true,"sort":[{"_id":{"order":"asc","missing":"_last","unmapped_type":"keyword"}}],"search_after":[""]}, cancelAfterTimeInterval=null, pipeline=null}] lastShard [true][2024-02-25T09:00:44,638][DEBUG][o.o.a.s.TransportSearchAction] [f92acfa0c58f3643980f1cada9df945d] #[org.opensearch.OpenSearchException,java.util.concurrent.ExecutionException,org.opensearch.core.common.breaker.CircuitBreakingException]#All shards failed for phase: [query]
OpenSearchException[java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [3469348057/3.2gb], which is larger than the limit of [515396075/491.5mb]]]; nested: ExecutionException[CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [3469348057/3.2gb], which is larger than the limit of [515396075/491.5mb]]]; nested: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [3469348057/3.2gb], which is larger than the limit of [515396075/491.5mb]];
        at org.opensearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.load(AbstractIndexOrdinalsFieldData.java:116)
        at org.opensearch.index.fielddata.plain.AbstractIndexOrdinalsFieldData.load(AbstractIndexOrdinalsFieldData.java:62)
        at org.opensearch.index.mapper.IdFieldMapper$IdFieldType$1$1.load(IdFieldMapper.java:209)
        at org.opensearch.index.fielddata.fieldcomparator.BytesRefFieldComparatorSource.getValues(BytesRefFieldComparatorSource.java:91)
        at org.opensearch.index.fielddata.fieldcomparator.BytesRefFieldComparatorSource$2.getBinaryDocValues(BytesRefFieldComparatorSource.java:141)
        at org.apache.lucene.search.FieldComparator$TermValComparator.getLeafComparator(FieldComparator.java:280)
        at org.apache.lucene.search.FieldValueHitQueue.getComparators(FieldValueHitQueue.java:176)
        at org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector.<init>(TopFieldCollector.java:64)
        at org.apache.lucene.search.TopFieldCollector$PagingFieldCollector$1.<init>(TopFieldCollector.java:254)
        at org.apache.lucene.search.TopFieldCollector$PagingFieldCollector.getLeafCollector(TopFieldCollector.java:254)
        at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:306)
        at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:281)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:551)
        at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:360)
        at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:447)
        at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:431)
        at org.opensearch.search.query.QueryPhaseSearcherWrapper.searchWith(QueryPhaseSearcherWrapper.java:65)
        at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:66)
        at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282)
        at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:155)
        at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:533)
        at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:597)
        at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:566)
        at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
        at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:917)
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)

Related component

Search:Resiliency

To Reproduce

One way to potentially reproduce this is

  1. Create an index with field data cache enabled on some of the text value fields.
  2. Ingest data till field data cache reaches (say 20%). Use GET /_cat/fielddata for monitoring.
  3. Set breaker limit to 1%.
  4. Execute heavy search queries (resulting in >1% of data size) on fields with field data cache enabled.

Expected behavior

  1. A validation should be added in OpenSearch to reject update setting request if indices.breaker.fielddata.limit is less than indices.fielddata.cache.size.
  2. Default value of indices.breaker.fielddata.limit is 40% of JVM and default cache size is unbounded. We should also consider setting the default cache size to be less than default breaker limit (say 38%).

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@vikasvb90 vikasvb90 added bug Something isn't working untriaged Search Search query, autocomplete ...etc labels Feb 27, 2024
@vikasvb90 vikasvb90 changed the title [BUG] Circuit breaker exceptions due to misconfigured fieldcache and circuit breaker for fieldcache [BUG] Circuit breaker exceptions due to misconfigured fielddata cache size and circuit breaker for fieldcache Feb 27, 2024
@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5]
@vikasvb90 Thanks for filing, looking forward to seeing this issue fixed

@getsaurabh02 getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Resiliency Search Search query, autocomplete ...etc
Projects
Status: Later (6 months plus)
Development

No branches or pull requests

2 participants