Skip to content

Commit

Permalink
Topic/monitoring (open-mpi#3109)
Browse files Browse the repository at this point in the history
Add a monitoring PML, OSC and IO. They track all data exchanges between processes,
with capability to include or exclude collective traffic. The monitoring infrastructure is
driven using MPI_T, and can be tuned of and on any time o any communicators/files/windows.
Documentations and examples have been added, as well as a shared library that can be
used with LD_PRELOAD and that allows the monitoring of any application.

Signed-off-by: George Bosilca <[email protected]>
Signed-off-by: Clement Foyer <[email protected]>


* add ability to querry pml monitorinting results with MPI Tools interface
using performance variables "pml_monitoring_messages_count" and
"pml_monitoring_messages_size"

Signed-off-by: George Bosilca <[email protected]>

* Fix a convertion problem and add a comment about the lack of component
retain in the new component infrastructure.

Signed-off-by: George Bosilca <[email protected]>

* Allow the pvar to be written by invoking the associated callback.

Signed-off-by: George Bosilca <[email protected]>

* Various fixes for the monitoring.
Allocate all counting arrays in a single allocation
Don't delay the initialization (do it at the first add_proc as we
know the number of processes in MPI_COMM_WORLD)

Add a choice: with or without MPI_T (default).

Signed-off-by: George Bosilca <[email protected]>

* Cleanup for the monitoring module.
Fixed few bugs, and reshape the operations to prepare for
global or communicator-based monitoring. Start integrating
support for MPI_T as well as MCA monitoring.

Signed-off-by: George Bosilca <[email protected]>

* Adding documentation about how to use pml_monitoring component.

Document present the use with and without MPI_T.
May not reflect exactly how it works right now, but should reflects
how it should work in the end.

Signed-off-by: Clement Foyer <[email protected]>

* Change rank into MPI_COMM_WORLD and size(MPI_COMM_WORLD) to global variables in pml_monitoring.c.
Change mca_pml_monitoring_flush() signature so we don't need the size and rank parameters.

Signed-off-by: George Bosilca <[email protected]>

* Improve monitoring support (including integration with MPI_T)

Use mca_pml_monitoring_enable to check status state. Set mca_pml_monitoring_current_filename iif parameter is set
Allow 3 modes for pml_monitoring_enable_output: - 1 : stdout; - 2 : stderr; - 3 : filename
Fix test : 1 for differenciated messages, >1 for not differenciated. Fix output.
Add documentation for pml_monitoring_enable_output parameter. Remove useless parameter in example
Set filename only if using mpi tools
Adding missing parameters for fprintf in monitoring_flush (for output in std's cases)
Fix expected output/results for example header
Fix exemple when using MPI_Tools : a null-pointer can't be passed directly. It needs to be a pointer to a null-pointer
Base whether to output or not on message count, in order to print something if only empty messages are exchanged
Add a new example on how to access performance variables from within the code
Allocate arrays regarding value returned by binding

Signed-off-by: Clement Foyer <[email protected]>

* Add overhead benchmark, with script to use data and create graphs out of the results
Signed-off-by: Clement Foyer <[email protected]>

* Fix segfault error at end when not loading pml
Signed-off-by: Clement Foyer <[email protected]>

* Start create common monitoring module. Factorise version numbering
Signed-off-by: Clement Foyer <[email protected]>

* Fix microbenchmarks script
Signed-off-by: Clement Foyer <[email protected]>

* Improve readability of code

NULL can't be passed as a PVAR parameter value. It must be a pointer to NULL or an empty string.

Signed-off-by: Clement Foyer <[email protected]>

* Add osc monitoring component

Signed-off-by: Clement Foyer <[email protected]>

* Add error checking if running out of memory in osc_monitoring

Signed-off-by: Clement Foyer <[email protected]>

* Resolve brutal segfault when double freeing filename
Signed-off-by: Clement Foyer <[email protected]>

* Moving to ompi/mca/common the proper parts of the monitoring system
Using common functions instead of pml specific one. Removing pml ones.

Signed-off-by: Clement Foyer <[email protected]>

* Add calls to record monitored data from osc. Use common function to translate ranks.

Signed-off-by: Clement Foyer <[email protected]>

* Fix test_overhead benchmark script distribution

Signed-off-by: Clement Foyer <[email protected]>

* Fix linking library with mca/common

Signed-off-by: Clement Foyer <[email protected]>

* Add passive operations in monitoring_test

Signed-off-by: Clement Foyer <[email protected]>

* Fix from rank calculation. Add more detailed error messages

Signed-off-by: Clement Foyer <[email protected]>

* Fix alignments. Fix common_monitoring_get_world_rank function. Remove useless trailing new lines

Signed-off-by: Clement Foyer <[email protected]>

* Fix osc_monitoring mget_message_count function call

Signed-off-by: Clement Foyer <[email protected]>

* Change common_monitoring function names to respect the naming convention. Move to common_finalize the common parts of finalization. Add some comments.

Signed-off-by: Clement Foyer <[email protected]>

* Add monitoring common output system

Signed-off-by: Clement Foyer <[email protected]>

* Add error message when trying to flush to a file, and open fails. Remove erroneous info message when flushing wereas the monitoring is already disabled.

Signed-off-by: Clement Foyer <[email protected]>

* Consistent output file name (with and without MPI_T).

Signed-off-by: Clement Foyer <[email protected]>

* Always output to a file when flushing at pvar_stop(flush).

Signed-off-by: Clement Foyer <[email protected]>

* Update the monitoring documentation.
Complete informations from HowTo. Fix a few mistake and typos.

Signed-off-by: Clement Foyer <[email protected]>

* Use the world_rank for printf's.
Fix name generation for output files when using MPI_T. Minor changes in benchmarks starting script

Signed-off-by: Clement Foyer <[email protected]>

* Clean potential previous runs, but keep the results at the end in order to potentially reprocess the data. Add comments.

Signed-off-by: Clement Foyer <[email protected]>

* Add security check for unique initialization for osc monitoring

Signed-off-by: Clement Foyer <[email protected]>

* Clean the amout of symbols available outside mca/common/monitoring

Signed-off-by: Clement Foyer <[email protected]>

* Remove use of __sync_* built-ins. Use opal_atomic_* instead.

Signed-off-by: Clement Foyer <[email protected]>

* Allocate the hashtable on common/monitoring component initialization. Define symbols to set the values for error/warning/info verbose output. Use opal_atomic instead of built-in function in osc/monitoring template initialization.

Signed-off-by: Clement Foyer <[email protected]>

* Deleting now useless file : moved to common/monitoring

Signed-off-by: Clement Foyer <[email protected]>

* Add histogram ditribution of message sizes

Signed-off-by: Clement Foyer <[email protected]>

* Add histogram array of 2-based log of message sizes. Use simple call to reset/allocate arrays in common_monitoring.c

Signed-off-by: Clement Foyer <[email protected]>

* Add informations in dumping file. Separate per category (pt2pt/osc/coll (to come)) monitored data

Signed-off-by: Clement Foyer <[email protected]>

* Add coll component for collectives communications monitoring

Signed-off-by: Clement Foyer <[email protected]>

* Fix warning messages : use c_name as the magic id is not always defined. Moreover, there was a % missing. Add call to release underlying modules. Add debug info messages. Add warning which may lead to further analysis.

Signed-off-by: Clement Foyer <[email protected]>

* Fix log10_2 constant initialization. Fix index calculation for histogram array.

Signed-off-by: Clement Foyer <[email protected]>

* Add debug info messages to follow more easily initialization steps.

Signed-off-by: Clement Foyer <[email protected]>

* Group all the var/pvar definitions to common_monitoring. Separate initial filename from the current on, to ease its lifetime management. Add verifications to ensure common is initialized once only. Move state variable management to common_monitoring.
monitoring_filter only indicates if filtering is activated.
Fix out of range access in histogram.
List is not used with the struct mca_monitoring_coll_data_t, so heritate only from opal_object_t.
Remove useless dead code.

Signed-off-by: Clement Foyer <[email protected]>

* Fix invalid memory allocation. Initialize initial_filename to empty string to avoid invalid read in mca_base_var_register.

Signed-off-by: Clement Foyer <[email protected]>

* Don't install the test scripts.

Signed-off-by: George Bosilca <[email protected]>
Signed-off-by: Clement Foyer <[email protected]>

* Fix missing procs in hashtable. Cache coll monitoring data.
    * Add MCA_PML_BASE_FLAG_REQUIRE_WORLD flag to the PML layer.
    * Cache monitoring data relative to collectives operations on creation.
    * Remove double caching.
    * Use same proc name definition for hash table when inserting and
      when retrieving.

Signed-off-by: Clement Foyer <[email protected]>

* Use intermediate variable to avoid invalid write while retrieving ranks in hashtable.

Signed-off-by: Clement Foyer <[email protected]>

* Add missing release of the last element in flush_all. Add release of the hashtable in finalize.

Signed-off-by: Clement Foyer <[email protected]>

* Use a linked list instead of a hashtable to keep tracks of communicator data. Add release of the structure at finalize time.

Signed-off-by: Clement Foyer <[email protected]>

* Set world_rank from hashtable only if found

Signed-off-by: Clement Foyer <[email protected]>

* Use predefined symbol from opal system to print int

Signed-off-by: Clement Foyer <[email protected]>

* Move collective monitoring data to a hashtable. Add pvar to access the monitoring_coll_data. Move functions header to a private file only to be used in ompi/mca/common/monitoring

Signed-off-by: Clement Foyer <[email protected]>

* Fix pvar registration. Use OMPI_ERROR isntead of -1 as returned error value. Fix releasing of coll_data_t objects. Affect value only if data is found in the hashtable.

Signed-off-by: Clement Foyer <[email protected]>

* Add automated check (with MPI_Tools) of monitoring.

Signed-off-by: Clement Foyer <[email protected]>

* Fix procs list caching in common_monitoring_coll_data_t

    * Fix monitoring_coll_data type definition.
    * Use size(COMM_WORLD)-1 to determine max number of digits.

Signed-off-by: Clement Foyer <[email protected]>

* Add linking to Fortran applications for LD_PRELOAD usage of monitoring_prof

Signed-off-by: Clement Foyer <[email protected]>

* Add PVAR's handles. Clean up code (visibility, add comments...). Start updating the documentation

Signed-off-by: Clement Foyer <[email protected]>

* Fix coll operations monitoring. Update check_monitoring accordingly to the added pvar. Fix monitoring array allocation.

Signed-off-by: Clement Foyer <[email protected]>

* Documentation update.
Update and then move the latex and README documentation to a more logical place

Signed-off-by: Clement Foyer <[email protected]>

* Aggregate monitoring COLL data to the generated matrix. Update documentation accordingly.

Signed-off-by: Clement Foyer <[email protected]>

* Fix monitoring_prof (bad variable.vector used, and wrong array in PMPI_Gather).

Signed-off-by: Clement Foyer <[email protected]>

* Add reduce_scatter and reduce_scatter_block monitoring. Reduce memory footprint of monitoring_prof. Unify OSC related outputs.

Signed-off-by: Clement Foyer <[email protected]>

* Add the use of a machine file for overhead benchmark

Signed-off-by: Clement Foyer <[email protected]>

* Check for out-of-bound write in histogram

Signed-off-by: Clement Foyer <[email protected]>

* Fix common_monitoring_cache object init for MPI_COMM_WORLD

Signed-off-by: Clement Foyer <[email protected]>

* Add RDMA benchmarks to test_overhead
Add error file output. Add MPI_Put and MPI_Get results analysis. Add overhead computation for complete sending (pingpong / 2).

Signed-off-by: Clement Foyer <[email protected]>

* Add computation of average and median of overheads. Add comments and copyrigths to the test_overhead script

Signed-off-by: Clement Foyer <[email protected]>

* Add technical documentation

Signed-off-by: Clement Foyer <[email protected]>

* Adapt to the new definition of communicators

Signed-off-by: Clement Foyer <[email protected]>

* Update expected output in test/monitoring/monitoring_test.c

Signed-off-by: Clement Foyer <[email protected]>

* Add dumping histogram in edge case

Signed-off-by: Clement Foyer <[email protected]>

* Adding a reduce(pml_monitoring_messages_count, MPI_MAX) example

Signed-off-by: Clement Foyer <[email protected]>

* Add consistency in header inclusion.
Include ompi/mpi/fortran/mpif-h/bindings.h only if needed.
Add sanity check before emptying hashtable.
Fix typos in documentation.

Signed-off-by: Clement Foyer <[email protected]>

* misc monitoring fixes

* test/monitoring: fix test when weak symbols are not available
* monitoring: fix a typo and add a missing file in Makefile.am
and have monitoring_common.h and monitoring_common_coll.h included in the distro
* test/monitoring: cleanup all tests and make distclean a happy panda
* test/monitoring: use gettimeofday() if clock_gettime() is unavailable
* monitoring: silence misc warnings (#3)

Signed-off-by: Gilles Gouaillardet <[email protected]>

* Cleanups.

Signed-off-by: George Bosilca <[email protected]>

* Changing int64_t to size_t.
Keep the size_t used accross all monitoring components.
Adapt the documentation.
Remove useless MPI_Request and MPI_Status from monitoring_test.c.

Signed-off-by: Clement Foyer <[email protected]>

* Add parameter for RMA test case

Signed-off-by: Clement Foyer <[email protected]>

* Clean the maximum bound computation for proc list dump.
Use ptrdiff_t instead of OPAL_PTRDIFF_TYPE to reflect the changes from commit fa5cd0d.

Signed-off-by: Clement Foyer <[email protected]>

* Add communicator-specific monitored collective data reset

Signed-off-by: Clement Foyer <[email protected]>

* Add monitoring scripts to the 'make dist'
Also install them in the build and the install directories.

Signed-off-by: George Bosilca <[email protected]>
  • Loading branch information
bosilca authored Jun 26, 2017
1 parent b1e639e commit d55b666
Show file tree
Hide file tree
Showing 65 changed files with 8,216 additions and 684 deletions.
4 changes: 4 additions & 0 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -1409,6 +1409,10 @@ AC_CONFIG_FILES([
test/util/Makefile
])
m4_ifdef([project_ompi], [AC_CONFIG_FILES([test/monitoring/Makefile])])
m4_ifdef([project_ompi], [
m4_ifdef([MCA_BUILD_ompi_pml_monitoring_DSO_TRUE],
[AC_CONFIG_LINKS(test/monitoring/profile2mat.pl:test/monitoring/profile2mat.pl
test/monitoring/aggregate_profile.pl:test/monitoring/aggregate_profile.pl)])])
AC_CONFIG_FILES([contrib/dist/mofed/debian/rules],
[chmod +x contrib/dist/mofed/debian/rules])
Expand Down
49 changes: 19 additions & 30 deletions ompi/mca/coll/base/coll_base_find_available.c
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
* Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
* University Research and Technology
* Corporation. All rights reserved.
* Copyright (c) 2004-2005 The University of Tennessee and The University
* Copyright (c) 2004-2017 The University of Tennessee and The University
* of Tennessee Research Foundation. All rights
* reserved.
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
Expand Down Expand Up @@ -46,9 +46,6 @@
static int init_query(const mca_base_component_t * ls,
bool enable_progress_threads,
bool enable_mpi_threads);
static int init_query_2_0_0(const mca_base_component_t * ls,
bool enable_progress_threads,
bool enable_mpi_threads);

/*
* Scan down the list of successfully opened components and query each of
Expand Down Expand Up @@ -105,6 +102,20 @@ int mca_coll_base_find_available(bool enable_progress_threads,
}


/*
* Query a specific component, coll v2.0.0
*/
static inline int
init_query_2_0_0(const mca_base_component_t * component,
bool enable_progress_threads,
bool enable_mpi_threads)
{
mca_coll_base_component_2_0_0_t *coll =
(mca_coll_base_component_2_0_0_t *) component;

return coll->collm_init_query(enable_progress_threads,
enable_mpi_threads);
}
/*
* Query a component, see if it wants to run at all. If it does, save
* some information. If it doesn't, close it.
Expand Down Expand Up @@ -138,33 +149,11 @@ static int init_query(const mca_base_component_t * component,
}

/* Query done -- look at the return value to see what happened */

if (OMPI_SUCCESS != ret) {
opal_output_verbose(10, ompi_coll_base_framework.framework_output,
"coll:find_available: coll component %s is not available",
component->mca_component_name);
} else {
opal_output_verbose(10, ompi_coll_base_framework.framework_output,
"coll:find_available: coll component %s is available",
component->mca_component_name);
}

/* All done */
opal_output_verbose(10, ompi_coll_base_framework.framework_output,
"coll:find_available: coll component %s is %savailable",
component->mca_component_name,
(OMPI_SUCCESS == ret) ? "": "not ");

return ret;
}


/*
* Query a specific component, coll v2.0.0
*/
static int init_query_2_0_0(const mca_base_component_t * component,
bool enable_progress_threads,
bool enable_mpi_threads)
{
mca_coll_base_component_2_0_0_t *coll =
(mca_coll_base_component_2_0_0_t *) component;

return coll->collm_init_query(enable_progress_threads,
enable_mpi_threads);
}
53 changes: 53 additions & 0 deletions ompi/mca/coll/monitoring/Makefile.am
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#
# Copyright (c) 2016 Inria. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#

monitoring_sources = \
coll_monitoring.h \
coll_monitoring_allgather.c \
coll_monitoring_allgatherv.c \
coll_monitoring_allreduce.c \
coll_monitoring_alltoall.c \
coll_monitoring_alltoallv.c \
coll_monitoring_alltoallw.c \
coll_monitoring_barrier.c \
coll_monitoring_bcast.c \
coll_monitoring_component.c \
coll_monitoring_exscan.c \
coll_monitoring_gather.c \
coll_monitoring_gatherv.c \
coll_monitoring_neighbor_allgather.c \
coll_monitoring_neighbor_allgatherv.c \
coll_monitoring_neighbor_alltoall.c \
coll_monitoring_neighbor_alltoallv.c \
coll_monitoring_neighbor_alltoallw.c \
coll_monitoring_reduce.c \
coll_monitoring_reduce_scatter.c \
coll_monitoring_reduce_scatter_block.c \
coll_monitoring_scan.c \
coll_monitoring_scatter.c \
coll_monitoring_scatterv.c

if MCA_BUILD_ompi_coll_monitoring_DSO
component_noinst =
component_install = mca_coll_monitoring.la
else
component_noinst = libmca_coll_monitoring.la
component_install =
endif

mcacomponentdir = $(ompilibdir)
mcacomponent_LTLIBRARIES = $(component_install)
mca_coll_monitoring_la_SOURCES = $(monitoring_sources)
mca_coll_monitoring_la_LDFLAGS = -module -avoid-version
mca_coll_monitoring_la_LIBADD = \
$(OMPI_TOP_BUILDDIR)/ompi/mca/common/monitoring/libmca_common_monitoring.la

noinst_LTLIBRARIES = $(component_noinst)
libmca_coll_monitoring_la_SOURCES = $(monitoring_sources)
libmca_coll_monitoring_la_LDFLAGS = -module -avoid-version
Loading

0 comments on commit d55b666

Please sign in to comment.