-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-31893 Fix conflict between multiple thor components on same node #18687
HPCC-31893 Fix conflict between multiple thor components on same node #18687
Conversation
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-31893 Jirabot Action Result: |
@mckellyln This is a cleaner way to kill slaves on local & remote machines. Using kill_process from hpcc_setenv that will look at the passed in pidfile, kill based on the pid instead of matching a partial name like killall. Fixes the issue where multiple thors on the same cluster have a similar name and get matched incorrectly, killing slaves from the wrong thor component. The env file in the related Jira has a mythor1 and mythor2 on a local node for testing. Both Dan Camper and I tested these changes. The grep with "${slavename}_" has the _ after the slavename to ensure we don't match both "mythor2" and "mythor28" etc. The thorslave pid files are named ${slavename}_[num].pid. Where slavename is something like thorslave_mythor2. |
If I look now on a Thor node that has multiple separate thors on it I see - -rw-rw-r-- 1 hpcc hpcc 7 May 20 15:26 thorslave_thor400_112_5.pid -rw-rw-r-- 1 hpcc hpcc 7 May 20 23:27 thorslave_thor400_112_4_5.pid Will the grep of ${slavename}_ be good enough here to isolate the single thor chosen ? |
9a56920
to
1334972
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two comments.
And what was the original problem this is solving ? Was it that the process name was > 15 chars in length and the wrong pid was getting killed ?
@mckellyln Similar. Dan had two mythors on the same box and every time the second one was started, it would kill the slaves for the first. (mythor1, mythor2). The Jira contains an environment xml that I tested this against. |
Signed-off-by: Michael Gardner <[email protected]>
1334972
to
3c28b03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.
@ghalliday ready for merge |
Type of change:
Checklist:
Smoketest:
Testing: