Skip to content

Commit

Permalink
add ORTE daemon error in troubleshooting for multi_job_submission
Browse files Browse the repository at this point in the history
  • Loading branch information
lbarraga committed Jul 18, 2024
1 parent f81c166 commit 343ae44
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions mkdocs/docs/HPC/multi_job_submission.md
Original file line number Diff line number Diff line change
Expand Up @@ -455,6 +455,20 @@ specified:
# command
</code></pre>

## Troubleshooting

### Error: An ORTE daemon has unexpectedly failed after launch and before communicating back to mpirun

When submitting a Worker job, you might encounter the following error:
`An ORTE daemon has unexpectedly failed after launch and before communicating back to mpirun`.
This error can occur when the foss toolchain version of worker is loaded. Instead, try loading an iimpi toolchain version of worker.

to check for the available versions of worker, use the following command:

```bash
$ module avail worker
```


[^1]: MapReduce: 'Map' refers to the map pattern in which every item in
a collection is mapped onto a new value by applying a given
Expand Down

0 comments on commit 343ae44

Please sign in to comment.