[Logging Improvement] Using lambda invocations instead of checking debug/trace isEnabled explicitly #14662

akolarkunnu · 2024-07-05T16:47:21Z

Converted debug/trace/warn/info checks to lambda based logging APIs.

Present scenario:
if (logger.isTraceEnabled()) {
logger.trace("Some long-running operation returned {}", expensiveOperation());
}
Replace the existing checking of trace/debug logging before actual logging with lambda invocations within the debug/trace loggers:
logger.trace("Some long-running operation returned {}", () -> expensiveOperation());

This achieves the same lazy logging without having to check for logger levels.

Javadoc of "void debug(String message, Supplier<?>... paramSuppliers)" -> "Logs a message with parameters which are only to be constructed if the logging level is the DEBUG level."

Resolves #8646

Check List

[] ~~New functionality includes testing.~~
All tests pass
~~New functionality has been documented.~~
[] ~~New functionality has javadoc added~~
~~API changes companion pull request created.~~
~~Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)~~
Commits are signed per the DCO using --signoff
[] ~~Commit changes are listed out in CHANGELOG.md file (See: [Changelog]~~(../blob/main/CONTRIBUTING.md#changelog))~
~~Public documentation issue/PR created~~

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2024-07-05T17:35:27Z

✅ Gradle check result for 825529a: SUCCESS

codecov · 2024-07-05T17:35:57Z

Codecov Report

Attention: Patch coverage is 53.78486% with 232 lines in your changes missing coverage. Please review.

Project coverage is 71.81%. Comparing base (a918530) to head (c57d7e4).
Report is 412 commits behind head on main.

Files with missing lines	Patch %	Lines
...uting/allocation/decider/DiskThresholdDecider.java	31.81%	30 Missing ⚠️
...ch/indices/recovery/PeerRecoveryTargetService.java	5.55%	17 Missing ⚠️
...ing/allocation/allocator/RemoteShardsBalancer.java	48.27%	15 Missing ⚠️
...upport/replication/TransportReplicationAction.java	23.52%	13 Missing ⚠️
...ent/QueueResizingOpenSearchThreadPoolExecutor.java	14.28%	12 Missing ⚠️
...earch/action/search/AbstractSearchAsyncAction.java	43.75%	8 Missing and 1 partial ⚠️
...on/support/broadcast/TransportBroadcastAction.java	0.00%	9 Missing ⚠️
...rch/repositories/blobstore/FileRestoreContext.java	20.00%	8 Missing ⚠️
...g/opensearch/transport/netty4/Netty4Transport.java	12.50%	7 Missing ⚠️
.../org/opensearch/gateway/ReplicaShardAllocator.java	36.36%	7 Missing ⚠️
... and 35 more

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #14662      +/-   ##
============================================
- Coverage     71.84%   71.81%   -0.03%     
- Complexity    62911    62918       +7     
============================================
  Files          5176     5176              
  Lines        295133   295062      -71     
  Branches      42676    42534     -142     
============================================
- Hits         212029   211901     -128     
- Misses        65709    65806      +97     
+ Partials      17395    17355      -40

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

vikasvb90 · 2024-07-06T15:10:53Z

modules/reindex/src/main/java/org/opensearch/index/reindex/AbstractAsyncBulkByScrollAction.java

-                new ByteSizeValue(request.estimatedSizeInBytes())
-            );
-        }
+        logger.debug(


@akolarkunnu Thanks for raising this PR! From code readability and simplicity point of view your improvement definitely looks better but not having the if block will now lead to creation of new string and runnable objects. Runnable objects can be lightweight depending upon whether they are referencing any variables outside of their scope but string objects may or may not be lightweight and they may or may not be created every time.
I am not in strong disagreement of this change but I would like to see how JVM and CPU graphs look like with some benchmarks before and after this change.

Hi @vikasvb90 This change achieves the same lazy logging without having to check for logger levels.
Javadoc of "void debug(String message, Supplier<?>... paramSuppliers)" says "Logs a message with parameters which are only to be constructed if the logging level is the DEBUG level."
Are you saying to run any specific existing Microbenchmark test case or write a new Microbenchmark test case which covers this changed code area and execute that?

The message is very clear that only the log message is constructed later but the arguments which you are now passing will still occupy space in memory and most of the space would be occupied by string objects. So, you need to first check in total what is the impact of this change on heap.

I am not referring to micro benchmarks but macro benchmarks. https://github.com/opensearch-project/opensearch-benchmark

Hi @vikasvb90

I don't think we will see any quantifiable impact in benchmarking because:

Overhead of logging is considerably less than indexing/search activities against where we will benchmark them.

In our existing code, we already check for isDebugEnabled/isTraceEnabled at most places, so the code path is already short-circuited. It is the only extra boolean check of isDebugEnabled/isTraceEnabled that we will avoid.

If we had the debug/trace logs without the boolean isDebugEnabled/isTraceEnabled checks, we may have seen some improvement, but in this case, we will not. This refactoring is intended more from a cleaner code and aiming towards best coding practices, and does not contribute significantly towards performance gains.

I am not talking about the overhead of logging itself but the attempt of logging which gets discarded later. I understand what it is intended for. To reiterate, boolean checks earlier prevented the construction of objects passed in log but now they will start occupying space and compute as a result of their construction. Also, it isn't about how big the impact is as compared to search or indexing but whether there is any impact at all and I believe there is.
If you really want this change to be pushed, at least lets first correctly measure the impact of this change on heap.

@akolarkunnu I think some benchmarking results with profiling visualizations will help establish confidence on the changes although this is recommended by log4j: https://logging.apache.org/log4j/2.x/manual/api.html#java-8-lambda-support-for-lazy-logging

We can probably run geonames workload on a single node cluster with these configutations:

no changes, debug & trace disabled

your changes, debug & trace disabled
AND

no changes, debug & trace enabled

your changes, debug & trace enabled

Alternatively, you can reuse the exercise I pointed in #14723 (comment) as well - that should also get you CPU profiling. AFAIK, I know ClusterState/Metadata classes have considerate amount of debug/trace logs - verify once.

github-actions · 2024-07-17T18:16:06Z

✅ Gradle check result for 8b16658: SUCCESS

github-actions · 2024-07-19T02:59:26Z

❌ Gradle check result for f8790c5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

akolarkunnu · 2024-07-19T05:21:01Z

Why are these precommit tasks are not executing? Any issues from CI?

sandeshkr419 · 2024-08-05T18:37:07Z

...er/src/main/java/org/opensearch/cluster/routing/allocation/decider/DiskThresholdDecider.java

-                }
+                logger.debug(
+                    "less than the required {} free bytes threshold ({} free) on node {}, "
+                        + "but allowing allocation because primary has never been allocated",


Does doing a + on strings should be avoided? I mean we can have a single string, if that is better instead of concatenating 2.

If we remove + here, it affects code readability, it will become ~160 characters per one line.

sandeshkr419 · 2024-08-05T18:41:21Z

server/src/main/java/org/opensearch/transport/TransportService.java

@@ -1318,14 +1323,11 @@ private void checkForTimeout(long requestId) {
            sourceNode = null;
        }
        // call tracer out of lock
-        if (tracerLog.isTraceEnabled() == false) {


Let's not remove this block, this is an early termination condition, we don't want to add any regression by removing some early checks with this change.

Added it back

sandeshkr419 · 2024-08-05T18:59:34Z

server/src/main/java/org/opensearch/indices/recovery/PeerRecoveryTargetService.java

-                    .append("\n");
-                logger.trace("{}", sb);
+                logger.trace(
+                    "[{}][{}] recovery completed from {}, took[{}]\n   phase1: recovered_files [{}] with total_size of [{}]"


Same note on string concatenation as above.

same as above, even here ~320 characters per line

sandeshkr419 · 2024-08-05T19:09:51Z

server/src/main/java/org/opensearch/index/shard/IndexShard.java

@@ -1192,50 +1192,45 @@ private Engine.IndexResult index(Engine engine, Engine.Index index) throws IOExc
        active.set(true);
        final Engine.IndexResult result;
        index = indexingOperationListeners.preIndex(shardId, index);
+        final Engine.Index finalIndex = index;


How about:

final Engine.Index finalIndex = indexingOperationListeners.preIndex(shardId, index);

and then use finalIndex throughout.

also, do take into account that we may be breaking some things here because in L1234:
indexingOperationListeners.postIndex(shardId, index, e); because this operation will need to be operated on finalIndex then.

sandeshkr419 · 2024-08-05T19:29:25Z

...r/src/main/java/org/opensearch/cluster/routing/allocation/allocator/LocalShardsBalancer.java

@@ -367,6 +365,8 @@ private void balanceByWeights() {
                final BalancedShardsAllocator.ModelNode maxNode = modelNodes[highIdx];
                advance_range: if (maxNode.numShards(index) > 0) {
                    final float delta = absDelta(weights[lowIdx], weights[highIdx]);
+                    final int finalHighIdx = highIdx;


Instead of creating new final variables to be used, what about creating 2 private utilities like log_stop_balancing/balancing(final variables...) and then just call those utilities instead. I don't want us to create noise changes in code logic.

If I understood you correctly, you are suggesting like having a method like below for "lowIdx"

final int getLowIndex(final int index) {
return index;
}

and invoking this method from lambda expression like "getLowIndex(lowIdx)"
This won't be possible, because again "lowIdx" is non-final and we can not refer that from lambda expression.
If you meant something else please correct me.

No, abstracting out entire logging statement itself.

private log_stop_balancing(final int highIdx, final int lowIdx) { logger.trace( "Stop balancing index [{}] min_node [{}] weight: [{}]" + " max_node [{}] weight: [{}] delta: [{}]", () -> index, () -> maxNode.getNodeId(), () -> weights[finalHighIdx], () -> minNode.getNodeId(), () -> weights[finalLowIdx], () -> delta } private log_balancing(final int highIdx, final int lowIdx) { ... }

and then just calling this utility method.

sandeshkr419 · 2024-08-05T19:30:50Z

Why are these precommit tasks are not executing? Any issues from CI?

@akolarkunnu I think its probably because of merge conflicts. We can check CI once again once you resolve conflicts and address the code comments.

…bug/trace isEnabled explicitly Converted debug/trace/warn/info checks to lambda based logging APIs. Resolves opensearch-project#8646 Signed-off-by: Abdul Muneer Kolarkunnu <[email protected]>

github-actions · 2024-08-07T16:11:50Z

✅ Gradle check result for 6fe1d2f: SUCCESS

github-actions · 2024-08-07T16:40:27Z

✅ Gradle check result for c57d7e4: SUCCESS

opensearch-trigger-bot · 2024-09-17T15:22:16Z

This PR is stalled because it has been open for 30 days with no activity.

opensearch-trigger-bot · 2024-11-01T15:22:23Z

This PR is stalled because it has been open for 30 days with no activity.

sandeshkr419 · 2024-12-10T18:46:26Z

@akolarkunnu Did you get a chance to get to profiling and benchmarking?
It will be real nice to get these changes ahead of 3.0 release actually.

akolarkunnu · 2024-12-11T15:15:52Z

@akolarkunnu Did you get a chance to get to profiling and benchmarking? It will be real nice to get these changes ahead of 3.0 release actually.

I will resume this work soon.

akolarkunnu requested review from peternied, anasalkouz, andrross, ashking94, Bukhtawar, CEHENKLE, dblock, dbwiddis, gbbafna, kotwanikunal, mch2, msfroh, nknize, owaiskazi19, reta, Rishikesh1159, sachinpkale, saratvemulapalli, shwetathareja, sohami and VachaShah as code owners July 5, 2024 16:47

github-actions bot added distributed framework enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers labels Jul 5, 2024

vikasvb90 reviewed Jul 6, 2024

View reviewed changes

akolarkunnu force-pushed the logging branch from 8b16658 to f8790c5 Compare July 19, 2024 02:05

sandeshkr419 reviewed Aug 5, 2024

View reviewed changes

akolarkunnu added 2 commits August 7, 2024 20:44

[Logging Improvement] Using lambda invocations instead of checking de…

82496f2

…bug/trace isEnabled explicitly Converted debug/trace/warn/info checks to lambda based logging APIs. Resolves opensearch-project#8646 Signed-off-by: Abdul Muneer Kolarkunnu <[email protected]>

[Logging Improvement] Using lambda invocations instead of checking de…

6fe1d2f

…bug/trace isEnabled explicitly Converted debug/trace/warn/info checks to lambda based logging APIs. Resolves opensearch-project#8646 Signed-off-by: Abdul Muneer Kolarkunnu <[email protected]>

akolarkunnu force-pushed the logging branch from f8790c5 to 6fe1d2f Compare August 7, 2024 15:15

akolarkunnu requested a review from jed326 as a code owner August 7, 2024 15:15

[Logging Improvement] Using lambda invocations instead of checking de…

c57d7e4

…bug/trace isEnabled explicitly Converted debug/trace/warn/info checks to lambda based logging APIs. Resolves opensearch-project#8646 Signed-off-by: Abdul Muneer Kolarkunnu <[email protected]>

sandeshkr419 mentioned this pull request Aug 14, 2024

[Development Guide] Including Profiling Guidelines #15251

Open

opensearch-ci-bot mentioned this pull request Sep 9, 2024

[AUTOCUT] Gradle Check Flaky Test Report for RemoteSplitIndexIT #14296

Open

opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Sep 17, 2024

opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Logging Improvement] Using lambda invocations instead of checking debug/trace isEnabled explicitly #14662

[Logging Improvement] Using lambda invocations instead of checking debug/trace isEnabled explicitly #14662

akolarkunnu commented Jul 5, 2024 •

edited

Loading

github-actions bot commented Jul 5, 2024

codecov bot commented Jul 5, 2024 •

edited

Loading

vikasvb90 Jul 6, 2024

akolarkunnu Jul 16, 2024

vikasvb90 Jul 16, 2024

sandeshkr419 Jul 16, 2024

vikasvb90 Aug 8, 2024

sandeshkr419 Aug 14, 2024

github-actions bot commented Jul 17, 2024

github-actions bot commented Jul 19, 2024

akolarkunnu commented Jul 19, 2024

sandeshkr419 Aug 5, 2024

akolarkunnu Aug 7, 2024

sandeshkr419 Aug 5, 2024

akolarkunnu Aug 7, 2024

sandeshkr419 Aug 5, 2024

akolarkunnu Aug 7, 2024

sandeshkr419 Aug 5, 2024

akolarkunnu Aug 7, 2024

sandeshkr419 Aug 5, 2024

akolarkunnu Aug 7, 2024

sandeshkr419 Aug 15, 2024

sandeshkr419 commented Aug 5, 2024

github-actions bot commented Aug 7, 2024

github-actions bot commented Aug 7, 2024

opensearch-trigger-bot bot commented Sep 17, 2024

opensearch-trigger-bot bot commented Nov 1, 2024

sandeshkr419 commented Dec 10, 2024

akolarkunnu commented Dec 11, 2024

[Logging Improvement] Using lambda invocations instead of checking debug/trace isEnabled explicitly #14662

Are you sure you want to change the base?

[Logging Improvement] Using lambda invocations instead of checking debug/trace isEnabled explicitly #14662

Conversation

akolarkunnu commented Jul 5, 2024 • edited Loading

Check List

github-actions bot commented Jul 5, 2024

codecov bot commented Jul 5, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jul 17, 2024

github-actions bot commented Jul 19, 2024

akolarkunnu commented Jul 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandeshkr419 commented Aug 5, 2024

github-actions bot commented Aug 7, 2024

github-actions bot commented Aug 7, 2024

opensearch-trigger-bot bot commented Sep 17, 2024

opensearch-trigger-bot bot commented Nov 1, 2024

sandeshkr419 commented Dec 10, 2024

akolarkunnu commented Dec 11, 2024

akolarkunnu commented Jul 5, 2024 •

edited

Loading

codecov bot commented Jul 5, 2024 •

edited

Loading