-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-32193 Fix some issues with spill stats in smart join activity #18866
Conversation
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32193 Jirabot Action Result: |
30cc34f
to
2a67393
Compare
5aaba8c
to
333932c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly looks good. One question about thread safety
@@ -3018,6 +3035,13 @@ class CLookupJoinActivityBase : public CInMemJoinBase<HTHELPER, IHThorHashJoinAr | |||
activeStats.setStatistic(StNumSmartJoinDegradedToLocal, aggregateFailoversToLocal); // NB: is going to be same for all slaves. | |||
activeStats.setStatistic(StNumSmartJoinSlavesDegradedToStd, aggregateFailoversToStandard); | |||
} | |||
if (overflowWriteFileIO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does there need to be a critical block incase the overflow file, or the rhsSlaveRows are killed when this is being called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - please see comments
thorlcr/thorutil/thmem.hpp
Outdated
@@ -413,7 +413,7 @@ class graph_decl CThorSpillableRowArray : private CThorExpandingRowArray, implem | |||
mutable CriticalSection cs; | |||
ICopyArrayOf<IWritePosCallback> writeCallbacks; | |||
size32_t compBlkSz = 0; // means use default | |||
|
|||
CRuntimeStatisticCollection inactiveStats; // reset after each kill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they're not really "inactive" at this point afaics, they're the current stats of CThorSpillableRowArray (they get updated every save()). I think would be better to rename to 'stats'
overflowWriteStream.clear(); | ||
if (overflowWriteFileIO) | ||
{ | ||
mergeRemappedStats(PARENT::inactiveStats, overflowWriteFileIO, diskToTempStatsMap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I think this is correct approach - utility class uses standard IO stats types, and the activity remaps them to what it wants to
@@ -2678,8 +2690,12 @@ class CLookupJoinActivityBase : public CInMemJoinBase<HTHELPER, IHThorHashJoinAr | |||
IOutputMetaData *inputOutputMeta = rightITDL->queryFromActivity()->queryContainer().queryHelper()->queryOutputMeta(); | |||
// rows may either be in separate slave row arrays or in single rhs array, or split. | |||
rowcount_t total = rightCollector ? rightCollector->numRows() : (getGlobalRHSTotal() + rhs.ordinality()); | |||
if (rightCollector && rightCollector->hasSpilt()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think should move above the if !isOOMException .. i.e. still want to capture stats.
It would be cleaner if there was a Owned (before the try) and the code in the exception handler only differentiated the exception type, e.g.:
catch (IException *e)
{
if (!isOOMException(e))
exception.setown(e);
else
{
IOutputMetaData *inputOutputMeta = rightITDL->queryFromActivity()->queryContainer().queryHelper()->queryOutputMeta();
// rows may either be in separate slave row arrays or in single rhs array, or split.
rowcount_t total = rightCollector ? rightCollector->numRows() : (getGlobalRHSTotal() + rhs.ordinality());
exception.setown(checkAndCreateOOMContextException(this, e, "gathering RHS rows for lookup join", total, inputOutputMeta, NULL));
}
}
if (rightCollector && rightCollector->hasSpilt())
mergeStats(PARENT::inactiveStats, rightCollector);
if (exception)
throw exception.getClear();
thorlcr/thorutil/thmem.cpp
Outdated
@@ -234,6 +234,7 @@ class CSpillableStreamBase : public CSpillable | |||
unsigned spillCompInfo; | |||
CThorSpillableRowArray rows; | |||
Owned<CFileOwner> spillFile; | |||
CRuntimeStatisticCollection inactiveStats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are not really inactiveStats at this stage, I'd rename 'stats'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment missed? still think clearer if this is renamed
thorlcr/thorutil/thmem.cpp
Outdated
{ | ||
throwOnOom = false; | ||
} | ||
|
||
CThorSpillableRowArray::CThorSpillableRowArray(CActivityBase &activity, IThorRowInterfaces *rowIf, EmptyRowSemantics emptyRowSemantics, StableSortFlag stableSort, rowidx_t initialSize, size32_t _commitDelta) | ||
: CThorExpandingRowArray(activity, rowIf, ers_forbidden, stableSort, false, initialSize), commitDelta(_commitDelta) | ||
: CThorExpandingRowArray(activity, rowIf, ers_forbidden, stableSort, false, initialSize), commitDelta(_commitDelta), inactiveStats(spillStatistics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a mixture of some utility containers (inc jfile objects) that create standard stats (iCSmartRowBuffer creates regular IO stats). and others that now create 'spillStatistics'.
It would be clearer/more consistent, if we always let the utility classes create standard stats., and only in the activity code were they remapped.
i.e. I don't think CThorSpillableRowArray should itself capture "spillStatistics" and do the remapping. The activities that use them should do the remapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - please see comments.
I think it would aide the review process, to break this down into 2 or more separate PR's if it can be made consistent.
e.g. > make CThorRowCollectorBase use of stats from CThorSpillableRowArray for more accurate and simpler temp stats tracking (for CThorRowCollectorBase)
(and any knock on effects) could be own PR?
thorlcr/thorutil/thmem.cpp
Outdated
@@ -234,6 +234,7 @@ class CSpillableStreamBase : public CSpillable | |||
unsigned spillCompInfo; | |||
CThorSpillableRowArray rows; | |||
Owned<CFileOwner> spillFile; | |||
CRuntimeStatisticCollection inactiveStats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment missed? still think clearer if this is renamed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good, only one very minor comment.
thorlcr/thorutil/thmem.cpp
Outdated
@@ -1638,11 +1648,8 @@ class CThorRowCollectorBase : public CSpillable | |||
Owned<CSharedSpillableRowSet> spillableRowSet; | |||
unsigned options = 0; | |||
unsigned spillCompInfo = 0; | |||
RelaxedAtomic<unsigned> statOverflowCount{0}; | |||
RelaxedAtomic<offset_t> statSizeSpill{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: now also looks unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good. Please squash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - reapproving.
@jakesmith are we still happy for this to go in 9.6.x? Confident it will not cause a regression? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser one question
@@ -3001,12 +3022,13 @@ class CLookupJoinActivityBase : public CInMemJoinBase<HTHELPER, IHThorHashJoinAr | |||
GetTempFilePath(tempFilename, "lookup_local"); | |||
ActPrintLog("Overflowing RHS broadcast rows to spill file: %s", tempFilename.str()); | |||
overflowWriteFile.setown(container.queryActivity()->createOwnedTempFile(tempFilename.str())); | |||
overflowWriteStream.setown(createRowWriter(&(overflowWriteFile->queryIFile()), queryRowInterfaces(rightITDL), rwFlags)); | |||
overflowWriteFileIO.setown(overflowWriteFile->queryIFile().open(IFOcreate)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser should this be being cleared at some point - otherwise I think it will stay open longer than you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghalliday I'm now clearing overflowWriteFileIO at the same time that overflowWriteStream is cleared (Line 2118).
@shamser I think you have accidentally included a vcpkg update. Please squash, but remove that extra update. |
The current temp file statistics for smartjoin did not include all stats from all temp files: 1) temp files were closed before its sizes were recorded in the stats 2) stats from some types of temp files were not being tracked such as overflowWriteFile from RHS 3) stats from temp files that were closed in CSpillableStreamBase were not preserved 4) peak temp file size was not tracked in CThorSpillableRowArray 5) make CThorRowCollectorBase use of stats from CThorSpillableRowArray for more accurate and simpler temp stats tracking. Signed-off-by: Shamser Ahmed <[email protected]> HPCC-32193 Close overflowWriteFileIO when it's no longer needed Signed-off-by: Shamser Ahmed <[email protected]>
The current temp file statistics for smartjoin did not include all stats from all temp files:
Type of change:
Checklist:
Smoketest:
Testing: