SAI-4635 : Log refCount and core name if refCount of SolrCore is > 1 after SolrDispatchFilter finishes processing #145

patsonluk · 2023-09-18T21:18:41Z

Description

This is a follow-up on https://fullstory.atlassian.net/browse/SAI-4635 investigation which the source core failed to unload after a solrman move.

This is triggered by the RefCount of the SolrCore got stuck at > 1, hence preventing the unload. By code inspection, we could not find any trivial cause of such a "leak". However, this could also be caused by hanging core specific calls that holds such ref.

Solution

We currently only know such RefCount issue when we attempt to unload such core, hence it's hard to track when did that start happening.

Therefore we are adding some simple logging when upon SolrDispatcherFilter completion, the refCount still gets stuck at > 1. Take note that some occurrences of that does not necessarily indicate an issue (a long running query CAN hold such refcount for a while etc), however when unload failures happen, we can then at least track back WHEN did we started having a continuously elevated ref count and hopefully find out the actual cause of such issue from the log.

Additionally, we have considered just printing the same debug info on each core specific request start. However, this could be a bit too noisy for such an issue with rare occurrence. (twice in last few months https://fullstory.atlassian.net/browse/SAI-4635 and https://fullstory.atlassian.net/browse/SAI-4454

hiteshk25 · 2023-09-18T23:32:39Z

solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java

+      if (core != null) {
+        int coreRefCount = core.getOpenCount();
+        if (coreRefCount > 1) {
+          if (log.isInfoEnabled())


that will log this message whenever core is serving more than one request? if so it might be huge?

Yes, this could happen but I would assume it's still relative rare that a given core is accessed by multiple requests concurrently?

The log line itself is pretty short, so my hope is that it shouldn't burden the log too much.

Also crunched some number from BQ on one hour of log in prod fullstoryapp.stackdriver_logs.vm_solr_20230910_H18.

Total log entries: 126,535,028

To roughly estimate the extra entries caused by this, I looked at the number of "Core request" completion message logged by SolrCore#execute : 22,419,120. Every incoming core request should log that completion message exactly once.

It is true that our log added here (SolrDispatcherFilter, which core is non null) has a larger scope than "Core request" handled by SoreCore#execute, for example our change could be reached for admin request (core status?) that uses a core as well, however, I would consider MOST of the requests on prod comes in the form that triggers SoreCore#execute (query etc).

So in the worst case scenario, if all cores have concurrent requests, then it could add around 20% more log lines. However, I see that highly unlikely. I would be surprised if it adds more than 1m+ entry per hour 😓

…ispatchFilter finishes processing it.

patsonluk · 2023-09-22T17:45:25Z

Not going to do this as we are going to take thread dump from solrman/solrsidekick side instead see https://github.com/cowpaths/mn/pull/67067

patsonluk requested a review from hiteshk25 September 18, 2023 21:19

hiteshk25 reviewed Sep 18, 2023

View reviewed changes

Log refCount and core name if refCount of SolrCore is > 1 after SolrD…

7708b88

…ispatchFilter finishes processing it.

patsonluk force-pushed the patsonluk/SAI-4635-core-ref-debug branch from e864e20 to 7708b88 Compare September 20, 2023 21:13

patsonluk closed this Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAI-4635 : Log refCount and core name if refCount of SolrCore is > 1 after SolrDispatchFilter finishes processing #145

SAI-4635 : Log refCount and core name if refCount of SolrCore is > 1 after SolrDispatchFilter finishes processing #145

patsonluk commented Sep 18, 2023

hiteshk25 Sep 18, 2023

patsonluk Sep 19, 2023

patsonluk Sep 19, 2023 •

edited

Loading

patsonluk commented Sep 22, 2023

SAI-4635 : Log refCount and core name if refCount of SolrCore is > 1 after SolrDispatchFilter finishes processing #145

SAI-4635 : Log refCount and core name if refCount of SolrCore is > 1 after SolrDispatchFilter finishes processing #145

Conversation

patsonluk commented Sep 18, 2023

Description

Solution

hiteshk25 Sep 18, 2023

Choose a reason for hiding this comment

patsonluk Sep 19, 2023

Choose a reason for hiding this comment

patsonluk Sep 19, 2023 • edited Loading

Choose a reason for hiding this comment

patsonluk commented Sep 22, 2023

patsonluk Sep 19, 2023 •

edited

Loading