Skip to content

Commit

Permalink
Merge pull request #651 from lbarraga/document-fix-openmpi-daemon-failed
Browse files Browse the repository at this point in the history
add ORTE daemon error in troubleshooting for multi_job_submission
  • Loading branch information
hajgato authored Jul 18, 2024
2 parents 5f2bd98 + 343ae44 commit 1fdcada
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions mkdocs/docs/HPC/multi_job_submission.md
Original file line number Diff line number Diff line change
Expand Up @@ -455,6 +455,20 @@ specified:
# command
</code></pre>

## Troubleshooting

### Error: An ORTE daemon has unexpectedly failed after launch and before communicating back to mpirun

When submitting a Worker job, you might encounter the following error:
`An ORTE daemon has unexpectedly failed after launch and before communicating back to mpirun`.
This error can occur when the foss toolchain version of worker is loaded. Instead, try loading an iimpi toolchain version of worker.

to check for the available versions of worker, use the following command:

```bash
$ module avail worker
```


[^1]: MapReduce: 'Map' refers to the map pattern in which every item in
a collection is mapped onto a new value by applying a given
Expand Down

0 comments on commit 1fdcada

Please sign in to comment.