Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-32138 Generic remapping merge function to remap disk stats to spill stats #18819

Merged
merged 1 commit into from
Jul 26, 2024

Conversation

shamser
Copy link
Contributor

@shamser shamser commented Jun 27, 2024

…ent a generic mergeStats function that will remap to the stat names before setting them

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32138

Jirabot Action Result:
Workflow Transition: Merge Pending
Updated PR

@shamser shamser requested a review from jakesmith June 27, 2024 15:36
@shamser shamser marked this pull request as draft June 28, 2024 10:37
@shamser shamser marked this pull request as ready for review June 28, 2024 11:39
@shamser shamser force-pushed the issue32138 branch 2 times, most recently from 776416d to 0878095 Compare July 2, 2024 13:28
@shamser shamser changed the base branch from candidate-9.6.x to candidate-9.8.x July 2, 2024 13:28
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - please see comments.

thorlcr/thorutil/thormisc.cpp Outdated Show resolved Hide resolved
thorlcr/activities/hashdistrib/thhashdistribslave.cpp Outdated Show resolved Hide resolved
thorlcr/thorutil/thbuf.cpp Show resolved Hide resolved
if (likely(iFileIO))
v = iFileIO->getStatistic(useKind);
v += inactiveStats.getStatisticValue(useKind);
v = iFileIO->getStatistic(kind);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is potential thread unsafe. iFileIO could be null at this point, i.e. closeWriter could get in between the test and this line
Also shouldn't this still be v += ? inactive + current IFileIO stats?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re. the thread safety an alternative is not to examine the potentially changing thread-unsafe iFileIO here at all,
instead merge the stats into a stats container (e.g. inactive) when they are written, i.e. at the end of writeRowsFromInput().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - this potentially thread unsafe implementation seems to have been reintroduced in latest version

thorlcr/thorutil/thbuf.cpp Outdated Show resolved Hide resolved
system/jlib/jstats.h Outdated Show resolved Hide resolved
thorlcr/thorutil/thormisc.cpp Show resolved Hide resolved
@shamser shamser requested a review from jakesmith July 11, 2024 12:47
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good. Please squash.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good.

@jakesmith
Copy link
Member

@shamser - can you retarget this to 9.6 ?

@shamser shamser changed the base branch from candidate-9.8.x to candidate-9.6.x July 16, 2024 09:59
@shamser shamser requested a review from jakesmith July 16, 2024 09:59
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good.

jakesmith
jakesmith previously approved these changes Jul 16, 2024
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved.
@ghalliday - please merge.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser I think this is double counting. Have we got tests to verify this isn't happening (e.g. set watchdog frequency to 1s to make it obvious)?

@@ -99,6 +99,11 @@ const StatisticsMapping hashDistribActivityStatistics({StNumLocalRows, StNumRemo
const StatisticsMapping nsplitterActivityStatistics({}, spillStatistics, basicActivityStatistics);
const StatisticsMapping spillingWriteAheadStatistics({}, spillStatistics);

const StatKindMap diskToTempStatsMap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should ideally have a comment to indicate which is the source and which is the target

@@ -2549,6 +2538,7 @@ class CSharedFullSpillingWriteAhead : public CInterfaceOf<ISharedRowStreamReader
outputStream->flush();
totalInputRowsRead.fetch_add(newRowsWritten);
tempFileOwner->noteSize(iFileIO->getStatistic(StSizeDiskWrite));
::mergeStats(inactiveStats, iFileIO);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to double count the stats. Each time it is called it will add the stats from the previous call as well as the rows just written.

@shamser shamser requested a review from ghalliday July 17, 2024 16:21
system/jlib/jstats.cpp Outdated Show resolved Hide resolved
@jakesmith
Copy link
Member

jakesmith commented Jul 18, 2024

Have we got tests to verify this isn't happening (e.g. set watchdog frequency to 1s to make it obvious)?

good idea.
@ghalliday @shamser - would a performance suite run with this (and a low globalMemorySize) be a good candidate to run and examine to flush out issues with stats. ?

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - 1 issue.
@ghalliday - could merge as is and revisit if you prefer.

thorlcr/thorutil/thbuf.cpp Show resolved Hide resolved
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser
Squashing these commits made it very hard to understand what changes had been madE.

NOTE this comment from Jake before:

this is potential thread unsafe. iFileIO could be null at this point, i.e. closeWriter could get in between the test and this line
Also shouldn't this still be v += ? inactive + current IFileIO stats?

Member
@jakesmith jakesmith last week
re. the thread safety an alternative is not to examine the potentially changing thread-unsafe iFileIO here at all,
instead merge the stats into a stats container (e.g. inactive) when they are written, i.e. at the end of writeRowsFromInput().

@jakesmith jakesmith dismissed their stale review July 18, 2024 15:45

getStatistic() is thread unsafe again

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - please see comments re. getStatistic/iFileIO thread safety issue.

@shamser shamser requested a review from jakesmith July 19, 2024 10:13
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an issue with when/where the remapping is done in the splitter/CSharedFullSpillingWriteAhead code.

Has this been tested?
I ran a quick test and not see the stats. I was expecting (Size*SpillFile stats were missing).

@@ -2549,6 +2543,9 @@ class CSharedFullSpillingWriteAhead : public CInterfaceOf<ISharedRowStreamReader
outputStream->flush();
totalInputRowsRead.fetch_add(newRowsWritten);
tempFileOwner->noteSize(iFileIO->getStatistic(StSizeDiskWrite));
CRuntimeStatisticCollection currentFileStats(inactiveStats.queryMapping());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I think "inactiveStats" could do with renaming now, as it's active, "stats" is probably clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth having some functions like the following in jstats::

template <class INTERFACE>
void updateStatsDelta(CRuntimeStatisticCollection & fullStats, CRuntimeStatisticCollection & deltaStats, INTERFACE * source)
{
    CRuntimeStatisticCollection curStats(deltaStats.queryMapping());
    mergeStats(curStats, source);
    fullStats.updateDelta(deltaStats, curStats);
}

then this would be

updateStatsDelta(inactiveStats, previousFileStats, iFileIO);

@@ -2549,6 +2543,9 @@ class CSharedFullSpillingWriteAhead : public CInterfaceOf<ISharedRowStreamReader
outputStream->flush();
totalInputRowsRead.fetch_add(newRowsWritten);
tempFileOwner->noteSize(iFileIO->getStatistic(StSizeDiskWrite));
CRuntimeStatisticCollection currentFileStats(inactiveStats.queryMapping());
::mergeStats(currentFileStats, iFileIO);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is right.
currenFileStats has a spillingWriteAheadStatistics mapping, those stats are not within iFileIO ..

I think you either need to store as a mapping based on diskLocalStatistic, and allow the caller to remap (that's what the splitter code is doing), or, remap here and have no remapping in the caller.
It is probably better to accrue the stats in this container unmapped, and only remap in gatherStatistics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that - accumulate in the natural form, and map when it is added to an activity.

@ghalliday ghalliday self-requested a review July 23, 2024 14:50
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith a couple of comments. I agree with your comments.

@@ -2549,6 +2543,9 @@ class CSharedFullSpillingWriteAhead : public CInterfaceOf<ISharedRowStreamReader
outputStream->flush();
totalInputRowsRead.fetch_add(newRowsWritten);
tempFileOwner->noteSize(iFileIO->getStatistic(StSizeDiskWrite));
CRuntimeStatisticCollection currentFileStats(inactiveStats.queryMapping());
::mergeStats(currentFileStats, iFileIO);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that - accumulate in the natural form, and map when it is added to an activity.

@@ -2549,6 +2543,9 @@ class CSharedFullSpillingWriteAhead : public CInterfaceOf<ISharedRowStreamReader
outputStream->flush();
totalInputRowsRead.fetch_add(newRowsWritten);
tempFileOwner->noteSize(iFileIO->getStatistic(StSizeDiskWrite));
CRuntimeStatisticCollection currentFileStats(inactiveStats.queryMapping());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth having some functions like the following in jstats::

template <class INTERFACE>
void updateStatsDelta(CRuntimeStatisticCollection & fullStats, CRuntimeStatisticCollection & deltaStats, INTERFACE * source)
{
    CRuntimeStatisticCollection curStats(deltaStats.queryMapping());
    mergeStats(curStats, source);
    fullStats.updateDelta(deltaStats, curStats);
}

then this would be

updateStatsDelta(inactiveStats, previousFileStats, iFileIO);

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith I think this is good enough to merge. We should separately review all the mappings and make them consistent and clearly structured.

@@ -2562,7 +2555,7 @@ class CSharedFullSpillingWriteAhead : public CInterfaceOf<ISharedRowStreamReader
explicit CSharedFullSpillingWriteAhead(CActivityBase *_activity, unsigned _numOutputs, IRowStream *_input, bool _inputGrouped, const SharedRowStreamReaderOptions &_options, IThorRowInterfaces *rowIf, const char *_baseTmpFilename, ICompressHandler *_compressHandler)
: activity(*_activity), numOutputs(_numOutputs), input(_input), inputGrouped(_inputGrouped), options(_options), compressHandler(_compressHandler), baseTmpFilename(_baseTmpFilename),
meta(rowIf->queryRowMetaData()), serializer(rowIf->queryRowSerializer()), allocator(rowIf->queryRowAllocator()), deserializer(rowIf->queryRowDeserializer()),
inactiveStats(spillingWriteAheadStatistics)
inactiveStats(spillingWriteAheadStatistics), previousFileStats(spillingWriteAheadStatistics)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite the right mapping - it shouldn't include StSizePeakTempDisk, but it is good enough for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changed in the other pending PR - which will therefore need rebasing once this is merged.

…ill stats

Signed-off-by: Shamser Ahmed <[email protected]>
Signed-off-by: Jake Smith <[email protected]>
@ghalliday ghalliday merged commit b32d940 into hpcc-systems:candidate-9.6.x Jul 26, 2024
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants