Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logging Improvement] Using lambda invocations instead of checking debug/trace isEnabled explicitly #14662

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

akolarkunnu
Copy link
Contributor

@akolarkunnu akolarkunnu commented Jul 5, 2024

Converted debug/trace/warn/info checks to lambda based logging APIs.

Present scenario:
if (logger.isTraceEnabled()) {
logger.trace("Some long-running operation returned {}", expensiveOperation());
}
Replace the existing checking of trace/debug logging before actual logging with lambda invocations within the debug/trace loggers:
logger.trace("Some long-running operation returned {}", () -> expensiveOperation());

This achieves the same lazy logging without having to check for logger levels.

Javadoc of "void debug(String message, Supplier<?>... paramSuppliers)" -> "Logs a message with parameters which are only to be constructed if the logging level is the DEBUG level."

Resolves #8646

Check List

  • [] New functionality includes testing.
  • All tests pass
  • New functionality has been documented.
  • [] New functionality has javadoc added
  • API changes companion pull request created.
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • [] Commit changes are listed out in CHANGELOG.md file (See: [Changelog](../blob/main/CONTRIBUTING.md#changelog))~
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Jul 5, 2024

✅ Gradle check result for 825529a: SUCCESS

Copy link

codecov bot commented Jul 5, 2024

Codecov Report

Attention: Patch coverage is 53.78486% with 232 lines in your changes missing coverage. Please review.

Project coverage is 71.81%. Comparing base (a918530) to head (c57d7e4).
Report is 412 commits behind head on main.

Files with missing lines Patch % Lines
...uting/allocation/decider/DiskThresholdDecider.java 31.81% 30 Missing ⚠️
...ch/indices/recovery/PeerRecoveryTargetService.java 5.55% 17 Missing ⚠️
...ing/allocation/allocator/RemoteShardsBalancer.java 48.27% 15 Missing ⚠️
...upport/replication/TransportReplicationAction.java 23.52% 13 Missing ⚠️
...ent/QueueResizingOpenSearchThreadPoolExecutor.java 14.28% 12 Missing ⚠️
...earch/action/search/AbstractSearchAsyncAction.java 43.75% 8 Missing and 1 partial ⚠️
...on/support/broadcast/TransportBroadcastAction.java 0.00% 9 Missing ⚠️
...rch/repositories/blobstore/FileRestoreContext.java 20.00% 8 Missing ⚠️
...g/opensearch/transport/netty4/Netty4Transport.java 12.50% 7 Missing ⚠️
.../org/opensearch/gateway/ReplicaShardAllocator.java 36.36% 7 Missing ⚠️
... and 35 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #14662      +/-   ##
============================================
- Coverage     71.84%   71.81%   -0.03%     
- Complexity    62911    62918       +7     
============================================
  Files          5176     5176              
  Lines        295133   295062      -71     
  Branches      42676    42534     -142     
============================================
- Hits         212029   211901     -128     
- Misses        65709    65806      +97     
+ Partials      17395    17355      -40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

new ByteSizeValue(request.estimatedSizeInBytes())
);
}
logger.debug(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akolarkunnu Thanks for raising this PR! From code readability and simplicity point of view your improvement definitely looks better but not having the if block will now lead to creation of new string and runnable objects. Runnable objects can be lightweight depending upon whether they are referencing any variables outside of their scope but string objects may or may not be lightweight and they may or may not be created every time.
I am not in strong disagreement of this change but I would like to see how JVM and CPU graphs look like with some benchmarks before and after this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vikasvb90 This change achieves the same lazy logging without having to check for logger levels.
Javadoc of "void debug(String message, Supplier<?>... paramSuppliers)" says "Logs a message with parameters which are only to be constructed if the logging level is the DEBUG level."
Are you saying to run any specific existing Microbenchmark test case or write a new Microbenchmark test case which covers this changed code area and execute that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message is very clear that only the log message is constructed later but the arguments which you are now passing will still occupy space in memory and most of the space would be occupied by string objects. So, you need to first check in total what is the impact of this change on heap.

I am not referring to micro benchmarks but macro benchmarks. https://github.com/opensearch-project/opensearch-benchmark

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vikasvb90

I don't think we will see any quantifiable impact in benchmarking because:

  1. Overhead of logging is considerably less than indexing/search activities against where we will benchmark them.
  2. In our existing code, we already check for isDebugEnabled/isTraceEnabled at most places, so the code path is already short-circuited. It is the only extra boolean check of isDebugEnabled/isTraceEnabled that we will avoid.

If we had the debug/trace logs without the boolean isDebugEnabled/isTraceEnabled checks, we may have seen some improvement, but in this case, we will not. This refactoring is intended more from a cleaner code and aiming towards best coding practices, and does not contribute significantly towards performance gains.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not talking about the overhead of logging itself but the attempt of logging which gets discarded later. I understand what it is intended for. To reiterate, boolean checks earlier prevented the construction of objects passed in log but now they will start occupying space and compute as a result of their construction. Also, it isn't about how big the impact is as compared to search or indexing but whether there is any impact at all and I believe there is.
If you really want this change to be pushed, at least lets first correctly measure the impact of this change on heap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akolarkunnu I think some benchmarking results with profiling visualizations will help establish confidence on the changes although this is recommended by log4j: https://logging.apache.org/log4j/2.x/manual/api.html#java-8-lambda-support-for-lazy-logging

We can probably run geonames workload on a single node cluster with these configutations:

  • no changes, debug & trace disabled
  • your changes, debug & trace disabled
    AND
  • no changes, debug & trace enabled
  • your changes, debug & trace enabled

Alternatively, you can reuse the exercise I pointed in #14723 (comment) as well - that should also get you CPU profiling. AFAIK, I know ClusterState/Metadata classes have considerate amount of debug/trace logs - verify once.

Copy link
Contributor

✅ Gradle check result for 8b16658: SUCCESS

Copy link
Contributor

❌ Gradle check result for f8790c5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@akolarkunnu
Copy link
Contributor Author

Why are these precommit tasks are not executing? Any issues from CI?

}
logger.debug(
"less than the required {} free bytes threshold ({} free) on node {}, "
+ "but allowing allocation because primary has never been allocated",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does doing a + on strings should be avoided? I mean we can have a single string, if that is better instead of concatenating 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove + here, it affects code readability, it will become ~160 characters per one line.

@@ -1318,14 +1323,11 @@ private void checkForTimeout(long requestId) {
sourceNode = null;
}
// call tracer out of lock
if (tracerLog.isTraceEnabled() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not remove this block, this is an early termination condition, we don't want to add any regression by removing some early checks with this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it back

.append("\n");
logger.trace("{}", sb);
logger.trace(
"[{}][{}] recovery completed from {}, took[{}]\n phase1: recovered_files [{}] with total_size of [{}]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note on string concatenation as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, even here ~320 characters per line

@@ -1192,50 +1192,45 @@ private Engine.IndexResult index(Engine engine, Engine.Index index) throws IOExc
active.set(true);
final Engine.IndexResult result;
index = indexingOperationListeners.preIndex(shardId, index);
final Engine.Index finalIndex = index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

final Engine.Index finalIndex = indexingOperationListeners.preIndex(shardId, index);

and then use finalIndex throughout.

also, do take into account that we may be breaking some things here because in L1234:
indexingOperationListeners.postIndex(shardId, index, e); because this operation will need to be operated on finalIndex then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

@@ -367,6 +365,8 @@ private void balanceByWeights() {
final BalancedShardsAllocator.ModelNode maxNode = modelNodes[highIdx];
advance_range: if (maxNode.numShards(index) > 0) {
final float delta = absDelta(weights[lowIdx], weights[highIdx]);
final int finalHighIdx = highIdx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating new final variables to be used, what about creating 2 private utilities like log_stop_balancing/balancing(final variables...) and then just call those utilities instead. I don't want us to create noise changes in code logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood you correctly, you are suggesting like having a method like below for "lowIdx"

final int getLowIndex(final int index) {
return index;
}

and invoking this method from lambda expression like "getLowIndex(lowIdx)"
This won't be possible, because again "lowIdx" is non-final and we can not refer that from lambda expression.
If you meant something else please correct me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, abstracting out entire logging statement itself.

private log_stop_balancing(final int highIdx, final int lowIdx) {
     logger.trace(
                                 "Stop balancing index [{}]  min_node [{}] weight: [{}]" + "  max_node [{}] weight: [{}]  delta: [{}]",
                            () -> index,
                            () -> maxNode.getNodeId(),
                            () -> weights[finalHighIdx],
                            () -> minNode.getNodeId(),
                            () -> weights[finalLowIdx],
                            () -> delta
}

private log_balancing(final int highIdx, final int lowIdx) {
...
}

and then just calling this utility method.

@sandeshkr419
Copy link
Contributor

Why are these precommit tasks are not executing? Any issues from CI?

@akolarkunnu I think its probably because of merge conflicts. We can check CI once again once you resolve conflicts and address the code comments.

…bug/trace isEnabled explicitly

Converted debug/trace/warn/info checks to lambda based logging APIs.

Resolves opensearch-project#8646

Signed-off-by: Abdul Muneer Kolarkunnu <[email protected]>
…bug/trace isEnabled explicitly

Converted debug/trace/warn/info checks to lambda based logging APIs.

Resolves opensearch-project#8646

Signed-off-by: Abdul Muneer Kolarkunnu <[email protected]>
…bug/trace isEnabled explicitly

Converted debug/trace/warn/info checks to lambda based logging APIs.

Resolves opensearch-project#8646

Signed-off-by: Abdul Muneer Kolarkunnu <[email protected]>
Copy link
Contributor

github-actions bot commented Aug 7, 2024

✅ Gradle check result for 6fe1d2f: SUCCESS

Copy link
Contributor

github-actions bot commented Aug 7, 2024

✅ Gradle check result for c57d7e4: SUCCESS

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Sep 17, 2024
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Nov 1, 2024
@sandeshkr419
Copy link
Contributor

@akolarkunnu Did you get a chance to get to profiling and benchmarking?
It will be real nice to get these changes ahead of 3.0 release actually.

@akolarkunnu
Copy link
Contributor Author

@akolarkunnu Did you get a chance to get to profiling and benchmarking? It will be real nice to get these changes ahead of 3.0 release actually.

I will resume this work soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Logging Improvement] Using lambda invocations instead of checking debug/trace isEnabled explicitly
3 participants