Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to start a parallel pool #2

Open
soichih opened this issue Nov 3, 2018 · 1 comment
Open

Failed to start a parallel pool #2

soichih opened this issue Nov 3, 2018 · 1 comment

Comments

@soichih
Copy link
Contributor

soichih commented Nov 3, 2018

We are seeing a lot of failed jobs due to "Failed to start a parallel pool".

A couple of things we could try..

Right now, this App uses tempname() to generate the temp path for JobStorageLocation. I believe it uses /tmp as parent directory.

I wonder if we could use the current working directory instead.

Instead, I think we should create it under the current working directory.. in case use of /tmp is somehow causing the issue.

%need to use different profile directory to make sure multiple jobs won't share the same directory and crash
profile_dir='./profile';
mkdir(profile_dir);
c = parcluster();
c.JobStorageLocation = profile_dir;
pool = parpool(c, config.workers);

Right now, this App is skipping to set JobStorageLocation if mkdir(tmpdir) fails.

% check and set cachedir location
if OK
    % set local storage for parpool
    clust.JobStorageLocation = tmpdir;
end

I suggest removing this block and let the App fail if it fails to create a tmpdir (or at least add the log message inside the block to know that we are setting the JobStorageLocation

I have seen a similar parpool startup failure / random matlab crash before. I've workaround this by simply rerunning the code a few times if it starts to fail.

https://github.com/brain-life/app-dp-modelfit/blob/master/fit_model.sh#L39

It's ugly but very simple thing to try.. and for the DP App this has cured the issue of occasional hiccups.

@bcmcpher
Copy link
Owner

bcmcpher commented Nov 5, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants