-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-32031 Generate summary information in workunit to speed up file list operations #18785
HPCC-32031 Generate summary information in workunit to speed up file list operations #18785
Conversation
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32031 Jirabot Action Result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardkchapman looks good.
common/workunit/workunit.cpp
Outdated
IPropertyTree *list = p->queryPropTree(xpath); | ||
if (!list) | ||
return false; | ||
Owned<IPropertyTreeIterator> r = list->getElements("v"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be much better to store all the (key,value) pairs as a single serialized string value.
It is likely to be more efficient to store and retrieve. @jakesmith how do you think the performance/storage requirements would compare.
ecl/hqlcpp/hqlres.cpp
Outdated
@@ -648,7 +648,8 @@ bool ResourceManager::flush(StringBuffer &filename, const char *basename, bool f | |||
} | |||
fwrite(s.data.get(), 1, s.data.length(), bin); | |||
fclose(bin); | |||
fprintf(f, " .size %s,%u\n", label.str(), (unsigned)s.data.length()); | |||
if (!generateClang) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like an unrelated bug. If so it will need to target an earlier build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it's in general a change that is needed - but I needed it to be able to compile ANY workunit on my current mac system
2a11740
to
c6b6a42
Compare
Couple of things to think about:
|
common/workunit/workunit.cpp
Outdated
ForEach(*o) | ||
{ | ||
const char *name = o->query().queryProp("."); | ||
if (map.find(name) != map.end()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should say == map.end() I think?
c6b6a42
to
342ff2b
Compare
@ghallidat rebased to 9.8.x and marked as ready. I have made the stored format more compact, and made teh table case-insensitive |
From regression test: |
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32031 Jirabot Action Result: |
a32640e
to
bddfe0c
Compare
…list operations Track whether references are used in conjunction with OPT and signed code. Signed-off-by: Richard Chapman <[email protected]>
bddfe0c
to
bcdbc49
Compare
@jakesmith is it worth recording spills? They should be ignored for roxie targets (that can be in a later PR), but is there any benefit in having a summary list for hthor/thor jobs? |
if (map.find(name) == map.end()) | ||
map[name] = flags; | ||
else | ||
map[name] = map[name] & flags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can avoid re-searching. I will release an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardkchapman - looks good to me, couple of trivial comments (1 question re. SpillFile/JobTemp's)
ForEach(*graphs) | ||
{ | ||
Owned <IPropertyTree> xgmml = graphs->query().getXGMMLTree(false, false); | ||
Owned<IPropertyTreeIterator> iter = xgmml->getElements("//node[att/@name='_*ileName']"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'_*ileName' looks odd, is it because it can be _fileName, or spillFileName or similar? could do with a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(but I now see this is just moved code, not new code)
case SummaryType::WriteIndex: return "WriteIndex"; | ||
case SummaryType::PersistFile: return "PersistFile"; | ||
case SummaryType::SpillFile: return "SpillFile"; | ||
case SummaryType::JobTemp: return "JobTemp"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure of the usefulness/intended purposes of recording SpillTemp/JobTemp files
if (map.find(name) == map.end()) | ||
map[name] = flags; | ||
else | ||
map[name] = map[name] & flags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: could be:
auto it = map.find(name);
if (it == map.end())
map[name] = flags;
else
it->second &= flags;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I will fix in a follow on PR (along with the similar code below)
Type of change:
Checklist:
Smoketest:
Testing: