-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-29880 Serialize index write's jhtree and disk io stats regularly #19385
Conversation
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-29880 Jirabot Action Result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IKeyBuilder should expose the stats, so that gatherActiveStats can update the serialized stats. covering both scenarios.
@shamser - have added a few comments. Main issue is that I was expecting the stats to be refactored so that they came from IKeyBuilder, I think the code here, in thindexwriteslave (and master at some point), would become a lot cleaner.
@@ -261,6 +267,7 @@ class IndexWriteSlaveActivity : public ProcessSlaveActivity, public ILookAheadSt | |||
} | |||
try | |||
{ | |||
CriticalBlock b(builderFileCS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not new: I think the below mergeStats, followed by close() is in the wrong order (?) - close() might write to the underlying file io, so the close() should be before final mergeStats I think. Can you check?
CriticalBlock b(builderFileCS); | ||
if (builderIFileIO) | ||
{ | ||
mergeStats(activeStats, builderIFileIO, diskWriteRemoteStatistics); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from JIRA description:
The IKeyBuilder should expose the stats, so that gatherActiveStats can update the serialized stats. covering both scenarios.
I think all these stats should be coming from the builder (including the io stats), so that it a single call:
mergeStats(activeStats, builder, indexWriteActivityStatistics);
There's other stats that are currently be manually pulled out of the builder and serialize to the manager in a non-standardized stats way at the moment, that could also be directly gather from the IKeyBuilder
maxRecordSizeSeen = 0; | ||
builder.setown(createKeyBuilder(out, flags, maxDiskRecordSize, nodeSize, helper->getKeyedSize(), isTlk ? 0 : totalCount, helper, defaultIndexCompression, !isTlk, isTlk)); | ||
{ | ||
CriticalBlock b(builderFileCS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may not be that important, but locking scope is quite large here, e.g. could block in createMultipleWrite in i/o etc.
See other comment (and original JIRA description), it would be cleaner if the stats came from the builder rather than from a stashed IFileIO + some from builder, and the mutex to protect gatherActiveStats can then just be around the assignment/clearing of the builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good. A couple of trivial comments
You may want to address those 1st, but then go ahead and squash.
I also added some "future:" comments, re. moving other stats into regular stats. If you could look at those, open a JIRA(s) and link to this one. Thanks.
mb.append(numBranchNodes); | ||
mb.append(inactiveStats.getStatisticValue(StNumLeafCacheAdds)); | ||
mb.append(inactiveStats.getStatisticValue(StNumBlobCacheAdds)); | ||
mb.append(inactiveStats.getStatisticValue(StNumNodeCacheAdds)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future: Is there a separate JIRA to remove these manual serializations? ..since they will now be serialized as part of the activity stats. and the deserializing and manual setting of @numLeafNodes, @numBranchNodes, @numBlobNodes is superfluous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to change it in this PR as they are used in few other places. Yes, I can remove these in a future jira.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicateKeyCount = tmpBuilder->getDuplicateCount(); | ||
offsetBranches = tmpBuilder->getOffsetBranches(); | ||
branchMemorySize = tmpBuilder->getBranchMemorySize(); | ||
leafMemorySize = tmpBuilder->getLeafMemorySize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future: for next JIRA, but these should be added as stats to indexWriteActivityStatistics, and collected via builder in a common way, and manual serialization/deserialization here and in master should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return numLeaves; | ||
case StNumNodeCacheAdds: | ||
return numBranches; | ||
case StNumBlobCacheAdds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future: add other stats (offsetBranches, branchMemorySize, leadMemorySize) to stats mapping and collect these via gather stats too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good. Please squash.
Signed-off-by: Shamser Ahmed <[email protected]>
Jirabot Action Result: |
Type of change:
Checklist:
Smoketest:
Testing: