-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-31290 Fix Sasha Thor QMon switching issues #18302
HPCC-31290 Fix Sasha Thor QMon switching issues #18302
Conversation
https://track.hpccsystems.com/browse/HPCC-31290 |
@ghalliday - even though this affects 7.12 onwards, targetting master (to go into 9.6), because it is not a trivial fix, and this only came to light, after a failed attempt to use it in cloud. |
120a3ab
to
b4e2009
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions/comments. Looks like a sensible set of fixes.
xpath.appendf("Server[@queue=\"%s.thor\"]/WorkUnit",qname); | ||
Owned<IPropertyTreeIterator> iter = conn->queryRoot()->getElements(xpath.str()); | ||
getClusterThorQueueName(thorQName, qname); | ||
Owned<IPropertyTreeIterator> iter = conn->queryRoot()->getElements("Server[@queue]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@queue
looks a bit strange. Does that check that it has an entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it's a qualifier without a result. Returns all "Server" nodes that have a @Queue attribute.
dali/sasha/saqmon.cpp
Outdated
queueList.appendList(queues, ","); | ||
if (!queueList.contains(thorQName)) | ||
continue; | ||
wuids.append(server.queryProp("WorkUnit")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: wuids.append(wuid);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change.
ecl/wutest/wujobqtest2.cpp
Outdated
@@ -56,6 +56,7 @@ bool switchWorkunitQueue(const char *wuid, const char *cluster) | |||
Owned<IWorkUnit> wu = factory->updateWorkUnit(wuid); | |||
if (!wu) | |||
return false; | |||
return wu->switchThorQueue(cluster, &switcher); | |||
VStringBuffer item("*/%s/*", wuid); | |||
return wu->switchThorQueue(cluster, &switcher, item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should pass nullptr as item - otherwise it will only match 1 item (although without parallel workflow I doubt there is ever >1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's a mistake, will change.
@ghalliday - please see review changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please squash
3f05fd3
to
336d003
Compare
@ghalliday - squashed. |
@jakesmith I am going to be annoying(!) |
Automatic and manual queue switching by the Sasha QMon service have not worked since 7.12. Changes to the format of the queued thor job items meant it did not find any queued items to swap. This meant that workunits submitted with 'allowedclusters' and 'allowautoqueueswitch', that should have automatically switched to an idle Thor queue in the 'allowedclusters' set when the queue they were submitted to was busy, did not. Also fix the qmon tracing, which was supposed to trace the current workunits in flight running on Thor instances. That appears not to have worked well before 7.12. Signed-off-by: Jake Smith <[email protected]>
336d003
to
a5baecd
Compare
@ghalliday - revised commit title + message. |
Still not 100% convinced, but good enough to merge! |
Automatic and manual queue switching by the Sasha QMon service have not worked since 7.12.
Changes to the format of the queued thor job items meant it did not find any queued items to swap.
This meant that workunits submitted with 'allowedclusters' and 'allowautoqueueswitch', that should have automatically
switched to an idle Thor queue in the 'allowedclusters' set when the queue they were submitted to was busy, did not.
Also fix the qmon tracing, which was supposed to trace the current workunits in flight running on Thor instances. That appears not to have worked well before 7.12.
Type of change:
Checklist:
Smoketest:
Testing: