-
Notifications
You must be signed in to change notification settings - Fork 861
WeeklyTelcon_20160119
Jeff Squyres edited this page Nov 18, 2016
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Brad Benton
- Edgar Gabriel
- Howard
- Joshua Hursey
- Joshua Ladd
- Nathan Hjelm
- Ralph
- Sylvain Jeaugey
- Todd Kordenbrock
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.2
- Need to verify that library versions are still correct.
- Cisco Weekend MTT tests didn't look good.
- Build failure also.
- usNIC unable to connect. Maybe a cluster issue.
- Autogen --force didn't bring to 1.10, should remove from Cisco MTT.
- Ralph will try to replicate MPI_Abort. Abort test itself.
- 1.10 C Strided mutex lock issue. Nathan not surprised if it might be a bug. 1 fail. specific build config.
- enable memchecker build could be affecting timing. Nathan will take a look... should be simple.
- Jeff will look at MTT things after call.
- High CPU utilization on Async progress thread. Ralph will take a look. From -GE.
- After all of these issues are resolved / addressed can ship 1.10.2
- Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20
- Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker
- Nathan's progression decay function progress?
- Did Mellanox's UCX Modex stuff get merged in?
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0
- Last week discussed OMPI-IO + Luster slow on 2.0.0 (and master) branches. Discussed making ROMIO default for OMPI on Luster (only).
- Last week discussed Group Comms weren't working for Comms of powers of 2. Nathan found massive memory issue.
- Pull Requests - Several that Jeff, Ralph, or Howard need to review.
- PR 896 - not going to help us avoid Luster issue. Reduce priority of Luster below ROMIO.
- Edgar Tested on Cray.
- 894, 890, 900, 901 - Jeff and Howard are good with. Jeff will merge in.
- Travis is now being run on 2.0 branch.
- Issue 1299 - hang - want to get that into 2.0.0 - Nathan can you look at?
- Issue 1301 - check max CQ size before creating CQ. Joshua Ladd will assign it to someone.
- Should start marking these as 2.0.0 blockers.
- Issue 1252 - Performance - Nathan going to write a decay function for progression. Will create a Pull Request and Geoff Paulsen will test. Last big one, and kind of important.
- HWThreads - Ralph has no interest in going backwards to support physical CPUs. A real mess of switching if it's physical or virtual.
- What is the desire? Recent OS and BIOS seem to get it right. AMD and Intel seem to be different, and seems to come up. Generated a TON of confusion among users.
- Perhaps Mike has a use case that really demands it. Ralph will talk with him.
- Edgar's PR into master PR (Try to work around Luster, by switching over to use ROMIO).
- Not sure if issues he's seeing on Cray or on his cluster. Could be related, but need to get cluster running again.
- Wanted to see if any warnings from jenkins.
- But running that portion of code on Edgar's cluster, hits many issues.
- BTL flags = 305 perf got horrible (used to get better).
- did something else change in configure ? Hitting one issue after another independant of OMPIO.
- OMPIO is not finding PFS2 correctly during configure. Jeff can use screen share with Edgar.
- Issues only show up with 96 procs to hit, which makes debugging more difficult.
- Cisco some timeouts having
- LANL - Nathan - Not much, just trying to see if can find issue for Progress slowdown. Continue to iterate on RDMA stuff to look for any remaining bugs.
- Howard - reviewing PR on 2.0.0. Backlog of things for Edgar.
- Houston - New Component he's developed over last few weeks. Now competative on Cray, but too late for 2.0.0, s dynamic gen 2 - a number of new features unimplemented, but room to grow.
- HLRS - no update.
- IBM - Hired Joshua Hersey.
- Working on deciding internally to use GITHUB Enterprise, or GITLAB based approach.
- Working with David Solt on first PR, getting process setup for other developers.
- Working on writing up RFC proposals.
- LANL, Houston, HLRS, IBM
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel