Release 2018.09
Visible Improvements
hpcrun
Significantly enhanced robustness of HPCToolkit's measurement infrastructure to better support profiling of highly multithreaded applications.
-
Overhauled initialization to support profiling of applications that create threads in constructors that must synchronize before main is entered.
-
Improved call stack unwinding on all platforms.
-
Improved support for collecting call path samples that include frames in the Linux kernel.
-
Refined handling for precise hardware events in the Linux perf sample source.
-
Refined scripts to avoid interactions with Darshan that can cause deadlock.
-
Note jobid in jsrun jobs.
hpcstruct
Improved program structure recovery of loops, inlined code, and outlined code using binary analysis of highly-optimized code.
-
Improved attribution of loops to source lines.
-
Made hpcstruct's output deterministic for irreducible loops.
-
Improved attribution for PLT stubs.
-
Improved name demangling.
hpcprof/hpcprof-mpi
Significantly enhanced robustness of hpcprof-mpi.
-
Emit a warning and proceed with analysis if measurement data is salvageable rather than aborting with a fatal error.
-
Tolerate missing load modules. Generate a placeholder if necessary and emit a warning rather than triggering a fatal error.
-
Tolerate cases where some ranks in hpcprof-mpi are not assigned any profiles to analyze.
-
Avoid unnecessary per-rank duplication of informational messages.
hpcviewer
- Calculate costs for inlined functions in bottom-up view and flat view as one would if they were actual functions.
Documentation
- Updated man pages and manuals.
Streamline user view
Migrated developer-centric functionality out of HPCToolkit's bin directory.
- Migrated hpcsummary to libexec/hpctoolkit.
- Removed support for creating DOT files from hpcstruct. Create a separate executable for developer use in libexec/hpctoolkit.
- Updated hpcproftt to remove stale command-line options. Migrate hpcproftt to libexec/hpctoolkit.
Bug Fixes
- Updated build system to automake 1.5.1 to handle newer Linux software stacks.
- Fixed latex2man script for perl 5.26.1.
- Fixed configuration to skip kernel sampling and disable support for BLOCKTIME for older Linux kernels.
- Fixed bugs related to handling Linux perf_events at runtime.
- Fixed race conditions that arise where samples arrive after shutting down a sample source or when monitoring ends while processing a sample.
- Corrected handling in HPCToolkit's measurement infrastructure for dlclose, which is frequently used by OpenMPI.
- Corrected support for libunwind to properly terminate unwinds on ARM when compilers put DWARF FDEs in .debug_frame rather than .eh_frame segments.
- Adjusted unwinder support for Power architectures to avoid libunwind.
- Adjusted support for integrating libunwind and binary analysis on x86_64 architectures.
- When measuring an execution, if hpcfnbounds quits wait for it to finish to avoid zombies.
- Corrected hpcprof-mpi to handle the corner case where an MPI rank is assigned no profiles to analyze.
- Added comprehensive error handling in hpcprof-mpi when writing files, especially to handle disk full or quota exceeded errors.
- Fix selection of an alternate output directory in hpcprof-mpi.
- Report an error if hpcstruct is run on anything other than an ELF binary.
- Correct handling for pseudo-roots such as in hpcviewer's flat view.