Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Atomic vars in multithreaded env. ++num and num++ operations aren… #16356

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dk2k
Copy link
Contributor

@dk2k dk2k commented Oct 16, 2024

Brief description of changes.
For volatile num:
++num transformed to num.incrementAndGet()
num++ transformed to num.getAndIncrement()

…'t atomic, can't use them with volatile vars

Signed-off-by: Dmitry Kryukov <[email protected]>
Copy link
Contributor

❌ Gradle check result for 20ea733: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good change, @andrross double check with me?

Copy link
Contributor

❕ Gradle check result for 20ea733: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 85.71429% with 2 lines in your changes missing coverage. Please review.

Project coverage is 72.10%. Comparing base (ec7b652) to head (20ea733).

Files with missing lines Patch % Lines
...eldcaps/TransportFieldCapabilitiesIndexAction.java 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16356      +/-   ##
============================================
+ Coverage     72.00%   72.10%   +0.10%     
- Complexity    64817    64880      +63     
============================================
  Files          5307     5307              
  Lines        302660   302663       +3     
  Branches      43724    43724              
============================================
+ Hits         217931   218238     +307     
+ Misses        66906    66578     -328     
- Partials      17823    17847      +24     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

// keyword in order to ensure visibility. Note that it is fine to use `volatile` for a counter in that case given
// that although nextOrNull might be called from different threads, it can never happen concurrently.
private volatile int index;
private AtomicInteger index = new AtomicInteger();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here explains why the original author thought a volatile primitive is acceptable. Why do you think that is incorrect?

Also, the primitive int will use 4 bytes, while an AtomicInteger object will use at least 16 bytes (if my understanding of how Java allocates Objects is correct...), so this change will result in at least a small increase in memory footprint. The difference is may be completely negligible but I don't know that for sure.

Copy link
Member

@dbwiddis dbwiddis Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the primitive int will use 4 bytes, while an AtomicInteger object will use at least 16 bytes (if my understanding of how Java allocates Objects is correct...)

It's worse than that. The 16 bytes is just for the object, but it needs to maintain the internal 4-byte value, plus objects are 8-byte aligned so it'll be 24 bytes plus an 8-byte memory address for a total of 32 bytes. Some guy spent way too long investigating this a decade ago.

Copy link
Contributor Author

@dk2k dk2k Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrross
"volatile" keyword means you want to see the actual value in multithreaded env - i.e. no caching of the value.
But operations like num++ and ++num aren't atomic. And the other thread can see the outdated value.
I think I need to pass 0 to the constructor of Atomic integer in line 48. How do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dk2k As you say, atomicity is required for incrementing, but that never happens in a multithreaded context (per the comment).

All that is required for reading is getting the latest value. If you catch it in the middle of an update you may get +1 or -1 but that's really no different than grabbing it right before or right after an (atomic) update as far as the other thread is concerned.

If you think the original author's claim that nextOrNull won't ever be called concurrently is wrong, please explain why.

@@ -28,7 +29,7 @@
@State(Scope.Benchmark)
@SuppressWarnings("unused") // invoked by benchmarking framework
public class NanoTimeVsCurrentTimeMillisBenchmark {
private volatile long var = 0;
private final AtomicLong var = new AtomicLong(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this particular change is an improvement. JMH runs a single thread per fork, so the volatile long is sufficient, and the extra overhead might impact the benchmark and might skew the results.

Since the accessing of a long variable is intended as an upper bound for the other calls, I'm somewhat skeptical it would retain that "upper bound" meaning with additional overhead vs. the other two method calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, I think the intent of this benchmark is to calculate the difference between accessing Millis vs. Nanos, and remove the overhead of accessing a long (as an upper bound) for a fair comparison of the remainder of the actual code.

So this change needs to be removed.

// keyword in order to ensure visibility. Note that it is fine to use `volatile` for a counter in that case given
// that although nextOrNull might be called from different threads, it can never happen concurrently.
private volatile int index;
private AtomicInteger index = new AtomicInteger();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here seems to indicate that volatile is for visibility but the incrementing won't happen concurrently. This adds unneeded overhead.

@@ -277,7 +278,7 @@ private ShardRouting nextRoutingOrNull(Exception failure) {
}

private void moveToNextShard() {
++shardIndex;
shardIndex.incrementAndGet();
Copy link
Member

@dbwiddis dbwiddis Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the change is needed here, as this won't ever be accessed concurrently. It's only ever accessed on initialization and then after receiving a response to a transport message, so there's a forced network/transport delay between successive calls to this method.

…sultConsumer.java

Co-authored-by: Daniel Widdis <[email protected]>
Signed-off-by: Dmitry Kryukov <[email protected]>
Copy link
Contributor

❌ Gradle check result for 8b62369: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Member

@dbwiddis dbwiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewing after the last commit.

  • The change to QueryPhaseResultsComsumer is good and the latest commit improves it.
  • I think the changes to TransportFieldCapabilitiesIndexAction and PlainIterator are not needed, but would not veto an approval by other maintainers.
  • I think the change to the NanoTimeVsCurrentTimeMillisBenchmark invalidates the benchmark and I will not approve without this change being removed from the PR.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Nov 19, 2024
dk2k added 2 commits November 23, 2024 20:16
Signed-off-by: Dmitry Kryukov <[email protected]>
Signed-off-by: Dmitry Kryukov <[email protected]>
@dk2k
Copy link
Contributor Author

dk2k commented Nov 23, 2024

@dbwiddis I left only the good change

Copy link
Contributor

❌ Gradle check result for 3fad0c3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Nov 26, 2024
@@ -264,7 +265,7 @@ private class PendingMerges implements Releasable {
private final SearchPhaseController.TopDocsStats topDocsStats;
private volatile MergeResult mergeResult;
private volatile boolean hasPartialReduce;
private volatile int numReducePhases;
private final AtomicInteger numReducePhases = new AtomicInteger();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to change this to AtomicInteger?

If usage of an int hasn't been resulted in any issue so far as we know, then I think we might just add unnecessary overhead of using AtomicInteger over an int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a definite issue here. But no one digs that deeply and reports it, if faces.
++num and num++ operations aren't atomic, so you can get an out-of-sync value from time to time

Copy link
Contributor

@sandeshkr419 sandeshkr419 Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think your analysis is entirely accurate.

The increment operation (++numReducePhases) is performed within a thread-safe block. Specifically, tryExecuteNext() method begins by acquiring a synchronized lock on this.

We’d benefit from AtomicInteger instead of volatile int if numReducePhases were incremented outside of a synchronized block or lock-free code which is not the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandeshkr419 You are right. That was the only previously approved change and it's uneecessary. I'm going to close this PR

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants