Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAI-4635 : Log refCount and core name if refCount of SolrCore is > 1 after SolrDispatchFilter finishes processing #145

Closed
wants to merge 1 commit into from

Conversation

patsonluk
Copy link
Collaborator

Description

This is a follow-up on https://fullstory.atlassian.net/browse/SAI-4635 investigation which the source core failed to unload after a solrman move.

This is triggered by the RefCount of the SolrCore got stuck at > 1, hence preventing the unload. By code inspection, we could not find any trivial cause of such a "leak". However, this could also be caused by hanging core specific calls that holds such ref.

Solution

We currently only know such RefCount issue when we attempt to unload such core, hence it's hard to track when did that start happening.

Therefore we are adding some simple logging when upon SolrDispatcherFilter completion, the refCount still gets stuck at > 1. Take note that some occurrences of that does not necessarily indicate an issue (a long running query CAN hold such refcount for a while etc), however when unload failures happen, we can then at least track back WHEN did we started having a continuously elevated ref count and hopefully find out the actual cause of such issue from the log.

Additionally, we have considered just printing the same debug info on each core specific request start. However, this could be a bit too noisy for such an issue with rare occurrence. (twice in last few months https://fullstory.atlassian.net/browse/SAI-4635 and https://fullstory.atlassian.net/browse/SAI-4454

@patsonluk patsonluk requested a review from hiteshk25 September 18, 2023 21:19
if (core != null) {
int coreRefCount = core.getOpenCount();
if (coreRefCount > 1) {
if (log.isInfoEnabled())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that will log this message whenever core is serving more than one request? if so it might be huge?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could happen but I would assume it's still relative rare that a given core is accessed by multiple requests concurrently?

The log line itself is pretty short, so my hope is that it shouldn't burden the log too much.

Copy link
Collaborator Author

@patsonluk patsonluk Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also crunched some number from BQ on one hour of log in prod fullstoryapp.stackdriver_logs.vm_solr_20230910_H18.

Total log entries: 126,535,028

To roughly estimate the extra entries caused by this, I looked at the number of "Core request" completion message logged by SolrCore#execute : 22,419,120. Every incoming core request should log that completion message exactly once.

It is true that our log added here (SolrDispatcherFilter, which core is non null) has a larger scope than "Core request" handled by SoreCore#execute, for example our change could be reached for admin request (core status?) that uses a core as well, however, I would consider MOST of the requests on prod comes in the form that triggers SoreCore#execute (query etc).

So in the worst case scenario, if all cores have concurrent requests, then it could add around 20% more log lines. However, I see that highly unlikely. I would be surprised if it adds more than 1m+ entry per hour 😓

@patsonluk patsonluk force-pushed the patsonluk/SAI-4635-core-ref-debug branch from e864e20 to 7708b88 Compare September 20, 2023 21:13
@patsonluk
Copy link
Collaborator Author

Not going to do this as we are going to take thread dump from solrman/solrsidekick side instead see https://github.com/cowpaths/mn/pull/67067

@patsonluk patsonluk closed this Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants