-
Notifications
You must be signed in to change notification settings - Fork 861
WeeklyTelcon_20200210
Geoffrey Paulsen edited this page Feb 12, 2020
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoffrey Paulsen (IBM)
- Jeff Squyres (Cisco)
- Brian Barrett (AWS)
- Artem Polyakov (Mellanox)
- Todd Kordenbrock (Sandia)
- Brendan Cunningham (Intel)
- Austen Lauria (IBM)
- Edgar Gabriel (UH)
- Joseph Schuchart
- Josh Hursey (IBM)
- Ralph Castain (Intel)
- Nathan Hjelm (Google)
- Michael Heinz (Intel)
- William Zhang (AWS)
- Harumi Kuno (HPE)
- Howard Pritchard (LANL)
- Noah Evans (Sandia)
- Joshua Ladd (Mellanox)
- Thomas Naughton (ORNL)
- Charles Shereda (LLNL)
- David Bernhold (ORNL)
- George Bosilca (UTK)
- Matthew Dosanjh (Sandia)
- Brandon Yates (Intel)
- Erik Zeiske
- Mark Allen (IBM)
- Matias Cabral (Intel)
- Xin Zhao (Mellanox)
- mohan (AWS)
- Akshay Venkatesh (NVIDIA)
-
Jeff is going to register everyone for Face to Face after the call.
- If you're coming to Face to face, please add yourself to wiki now.
-
MTT -
- If you change your MTT to startup PRRTE at begining of session, and just use prun.
- Can see times cut in half or more.
- This is good, but also need to test mpirun wrapper.
- Cisco is converting half of MPI installs to use prrte/prun
-
AWS where can scale out horizontally, will continue to do both.
-
PRRTE Transition:
- ORTE is gone, PRRTE is in it's place. Expect some hickups
- A bunch of MTT failures, because people need to update command line changes for
-
vs--
in command line prompts. - A number of Fortran failures, that don't make much sense.
-
IBM MTT is hitting IOF issue, where file descriptor shuts down, and libevent spins hard
-
PRRTE - Josh turned on CI.
- Auto labeller is not yet there. Experimenting
- Like to get OMPI side running prte option
- Whenever we move PMIX or PRTE submodule pointer, it'll label the PR.
-
Anyone can click the override-merge button.
- Hasn't been an issue, but remember this won't trigger PR based hooks.
- Still 1+ month of effort before Open MPI v5.0 could be ready with this.
- see: https://github.com/openpmix/prrte/issues/298 for additional mpirun launch items
-
OMPI master submodule pointers setup to track PMIx and PRRTE master.
- Hopefully long term, master can track release branches.
- But still ensure there's some regression tracking of master/master/master.
- But once things settle down, might not want everyone's masters tracking each other.
- But if we DONT have master/master/master then new features that span across repos will be challenging
- Ompi v5.0 might want to trigger a major revision of other dependencies (PMIx and PRRTE)?
- Singleton comm-spawn... how do we make this work? - PMIx understands it.
- Do we need to support singleton comm-spawn starting the PRRTEs?
- Now that we will support a persistant infrastructure, maybe we just require users to start it first.
- Address comm-spawn issues that have been raised.
Blockers All Open Blockers
Review v3.0.x Milestones v3.0.6
Review v3.1.x Milestones v3.1.6
- Jeff filed 7361 - compilation issue and filed.
Review v4.0.x Milestones v4.0.3
- v4.0.3 in the works.
- Put out an rc end of this week once PMIx 3.1.5 releases.
- Schedule: Feb 21.
- Schedule: No real schedule yet.
- No release managers selected yet.
- IBM (Geoff with Austen's help) is interested.
- Portland Oregon, Feb 17, 2020.
- Please register on Wiki page, since Jeff has to register you.
- Date looks good. Feb 17th right before MPI Forum
- 2pm monday, and maybe most of Tuesday
- Cisco has a portland facility and is happy to host.
- about 20-30 min drive from MPI Forum, will probably need a car.
Review Master Master Pull Requests
- PMIx v3.1.5 rc2 posted this week. Release should be Friday.
- CI testing only tests build and did it run, but doesn't test HOW it ran.
- Environment setup can be a bit different.
- For example no-permissions in
/tmp
. Might pass on one machine, and fail on another without/tmp
permissions.