-
Notifications
You must be signed in to change notification settings - Fork 861
WeeklyTelcon_20190219
Geoffrey Paulsen edited this page Mar 12, 2019
·
2 revisions
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Geoffroy Vallee
- Howard Pritchard
- Ralph Castain
- Todd Kordenbrock
- Xin Zhao
- Brian Barrett
- Josh Hursey
- Joshua Ladd
- Matias Cabral
- Thomas Naughton
- David Bernholdt
- Matthew Dosanjh
- George
- Akshay Venkatesh
- Edgar Gabriel
- Aravind Gopalakrishnan (Intel)
- Nathan Hjelm
- Dan Topa (LANL)
- Akshay Venkatesh (nVidia)
- Arm (UTK)
- Peter Gottesman (Cisco)
- mohan
- The HostGator web site (open-mpi.org) is coming up for renewal. We need to decide what we are going to do about it
- Expires in Summer (Start in May) Expires July 27th.
- Need to move domain names. (Who owns that?)
- It'd be nice to move to AWS.
- DNS should be owned by SPI. Still need to transfer that.
- Topic for April.
- Nathan Hjelm's day job will no longer involve Open MPI, so if you want him to review something, please check with him first.
- Next face to face is San Jose - April 23-April25 @ Cisco -San Jose.
Review All Open Blockers
Review v3.0.x Milestones v3.0.3
- Merging PRs this morniung
- Merged in a bunch of changes, and MTT still looks good.
- Consider disabling pmix-new-shmem mca param. (see PMIx Issue 1114)
- Should resolve https://github.com/open-mpi/ompi/issues/6198 before releasing
Review v3.1.x Milestones v3.1.0
- Merging PRs this morning
- Merged in a bunch of changes, and MTT still looks good.
- Consider disabling pmix-new-shmem mca param. (see PMIx Issue 1114)
- Should resolve https://github.com/open-mpi/ompi/issues/6198 before releasing
Review v4.0.x Milestones v4.0.1
- Schedule: waiting for Issue6278 fix
- v4.0.1
- Consider disabling pmix-new-shmem mca param. (see PMIx Issue 1114)
- We have one report on older machine. Segv due to sharedmem lock creation.
- IBM's using that component heavily, and no issues.
- UoH has same architecture machine we could try to reproduce there.
- There is an mca param to disable if user hits.
- Consensus says leave it enabled.
- Adding OSHMEM API - bugfix. Need to rev .so versions correctly
- Serious issue https://github.com/open-mpi/ompi/issues/6198, but won't hold v4.0.1
- OFI/RML - was removed on master, but in v4.0.x the configury was broken.
- We could claim that removing ofi/rml is a bugfix.
- It was never intended to be in a production release. Must explicitly activate.
- Removing it is easiest. Don't suspect anyone is actually using this.
- We could claim that removing ofi/rml is a bugfix.
- Schedule: Delaying post Summer ***
- Discussion of schedule depends on scope discussion
- if we want to separate Orte out for that? Would be a bit past summer.
- Giles has a prototype of PRTE replacing ORTE
- Want to open up release-manager elections.
- Now that we're delaying, will decide at face2face.
- Is anyone pushing for a Summer of 2019 schedule?
- It seems too aggressive to everyone on the call
- One driver was to remove things to break ABI.
- Not a bad idea to DO v5.0, but summer timing is bad.
- Delaying would allow for switching to PRTE.
- PMIx Tools support
- Now the possibility of v4.1 from master is a possibility
- If we instead do a v4.1, some things we'd need fixed on master.
- will discuss more at face to face.
- Good Job Ralph fixed the 100% Cisco MTT fail.
- Cisco now has 70,000+ good runs. Still some static build issues.
- New Alert in PMIx side PMIx Issue 1114. - wrong answer in shared memory component.
- Ralph fixed a bug over the weekend:
- If you hit a process with SIGTERM while in a fence, PMIx server can sometimes get into a codepath that causes a SEGFault.
- Howard is still working on Open MPI calling PMIx directly.
- Take a look at Gile's PRTE work. He may have done SOME of that. He should have done that all in PRTE layer, maybe just some MPI layer work remains.
- PR6339 - seems to be working.
- 2000 files? - Because rm ORTE
- Howard will review PR6339, and ensure that whatever Giles did will survive that.
- Did he keep the framework, but keep it static?
- That's a better approach, so we can easily bring in an external component.
- IBM still has 10% failure rate and build issue. Please fix.
- PMIX direct call / PRTE replacement for ORTE.
- Howard has been changing OMPI or OPAL places that call the PMIx framework,
- to use PMIx data structures directly in the code.
- Doesn't look like Howard would step on Ralph's toes.
- March 4th is next MPI Forum (then June)
- We have a new open-mpi SLACK channel for Open MPI developers.
- Not for users, just developers...
- email Jeff If you're interested in being added.
- how do we get more participation, and make MTT more meaningful
Review Master Master Pull Requests
- didn't discuss today.
Review Master MTT testing
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu
- Amazon,
- Cisco, ORNL, UTK, NVIDIA