forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
orte: adjust rml:ofi configury to pickup CPPFLAGS #3
Open
naughtont3
wants to merge
1
commit into
main
Choose a base branch
from
tjn-orte-rmlofi-configury
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
naughtont3
pushed a commit
that referenced
this pull request
Oct 26, 2017
Add a monitoring PML, OSC and IO. They track all data exchanges between processes, with capability to include or exclude collective traffic. The monitoring infrastructure is driven using MPI_T, and can be tuned of and on any time o any communicators/files/windows. Documentations and examples have been added, as well as a shared library that can be used with LD_PRELOAD and that allows the monitoring of any application. Signed-off-by: George Bosilca <[email protected]> Signed-off-by: Clement Foyer <[email protected]> * add ability to querry pml monitorinting results with MPI Tools interface using performance variables "pml_monitoring_messages_count" and "pml_monitoring_messages_size" Signed-off-by: George Bosilca <[email protected]> * Fix a convertion problem and add a comment about the lack of component retain in the new component infrastructure. Signed-off-by: George Bosilca <[email protected]> * Allow the pvar to be written by invoking the associated callback. Signed-off-by: George Bosilca <[email protected]> * Various fixes for the monitoring. Allocate all counting arrays in a single allocation Don't delay the initialization (do it at the first add_proc as we know the number of processes in MPI_COMM_WORLD) Add a choice: with or without MPI_T (default). Signed-off-by: George Bosilca <[email protected]> * Cleanup for the monitoring module. Fixed few bugs, and reshape the operations to prepare for global or communicator-based monitoring. Start integrating support for MPI_T as well as MCA monitoring. Signed-off-by: George Bosilca <[email protected]> * Adding documentation about how to use pml_monitoring component. Document present the use with and without MPI_T. May not reflect exactly how it works right now, but should reflects how it should work in the end. Signed-off-by: Clement Foyer <[email protected]> * Change rank into MPI_COMM_WORLD and size(MPI_COMM_WORLD) to global variables in pml_monitoring.c. Change mca_pml_monitoring_flush() signature so we don't need the size and rank parameters. Signed-off-by: George Bosilca <[email protected]> * Improve monitoring support (including integration with MPI_T) Use mca_pml_monitoring_enable to check status state. Set mca_pml_monitoring_current_filename iif parameter is set Allow 3 modes for pml_monitoring_enable_output: - 1 : stdout; - 2 : stderr; - 3 : filename Fix test : 1 for differenciated messages, >1 for not differenciated. Fix output. Add documentation for pml_monitoring_enable_output parameter. Remove useless parameter in example Set filename only if using mpi tools Adding missing parameters for fprintf in monitoring_flush (for output in std's cases) Fix expected output/results for example header Fix exemple when using MPI_Tools : a null-pointer can't be passed directly. It needs to be a pointer to a null-pointer Base whether to output or not on message count, in order to print something if only empty messages are exchanged Add a new example on how to access performance variables from within the code Allocate arrays regarding value returned by binding Signed-off-by: Clement Foyer <[email protected]> * Add overhead benchmark, with script to use data and create graphs out of the results Signed-off-by: Clement Foyer <[email protected]> * Fix segfault error at end when not loading pml Signed-off-by: Clement Foyer <[email protected]> * Start create common monitoring module. Factorise version numbering Signed-off-by: Clement Foyer <[email protected]> * Fix microbenchmarks script Signed-off-by: Clement Foyer <[email protected]> * Improve readability of code NULL can't be passed as a PVAR parameter value. It must be a pointer to NULL or an empty string. Signed-off-by: Clement Foyer <[email protected]> * Add osc monitoring component Signed-off-by: Clement Foyer <[email protected]> * Add error checking if running out of memory in osc_monitoring Signed-off-by: Clement Foyer <[email protected]> * Resolve brutal segfault when double freeing filename Signed-off-by: Clement Foyer <[email protected]> * Moving to ompi/mca/common the proper parts of the monitoring system Using common functions instead of pml specific one. Removing pml ones. Signed-off-by: Clement Foyer <[email protected]> * Add calls to record monitored data from osc. Use common function to translate ranks. Signed-off-by: Clement Foyer <[email protected]> * Fix test_overhead benchmark script distribution Signed-off-by: Clement Foyer <[email protected]> * Fix linking library with mca/common Signed-off-by: Clement Foyer <[email protected]> * Add passive operations in monitoring_test Signed-off-by: Clement Foyer <[email protected]> * Fix from rank calculation. Add more detailed error messages Signed-off-by: Clement Foyer <[email protected]> * Fix alignments. Fix common_monitoring_get_world_rank function. Remove useless trailing new lines Signed-off-by: Clement Foyer <[email protected]> * Fix osc_monitoring mget_message_count function call Signed-off-by: Clement Foyer <[email protected]> * Change common_monitoring function names to respect the naming convention. Move to common_finalize the common parts of finalization. Add some comments. Signed-off-by: Clement Foyer <[email protected]> * Add monitoring common output system Signed-off-by: Clement Foyer <[email protected]> * Add error message when trying to flush to a file, and open fails. Remove erroneous info message when flushing wereas the monitoring is already disabled. Signed-off-by: Clement Foyer <[email protected]> * Consistent output file name (with and without MPI_T). Signed-off-by: Clement Foyer <[email protected]> * Always output to a file when flushing at pvar_stop(flush). Signed-off-by: Clement Foyer <[email protected]> * Update the monitoring documentation. Complete informations from HowTo. Fix a few mistake and typos. Signed-off-by: Clement Foyer <[email protected]> * Use the world_rank for printf's. Fix name generation for output files when using MPI_T. Minor changes in benchmarks starting script Signed-off-by: Clement Foyer <[email protected]> * Clean potential previous runs, but keep the results at the end in order to potentially reprocess the data. Add comments. Signed-off-by: Clement Foyer <[email protected]> * Add security check for unique initialization for osc monitoring Signed-off-by: Clement Foyer <[email protected]> * Clean the amout of symbols available outside mca/common/monitoring Signed-off-by: Clement Foyer <[email protected]> * Remove use of __sync_* built-ins. Use opal_atomic_* instead. Signed-off-by: Clement Foyer <[email protected]> * Allocate the hashtable on common/monitoring component initialization. Define symbols to set the values for error/warning/info verbose output. Use opal_atomic instead of built-in function in osc/monitoring template initialization. Signed-off-by: Clement Foyer <[email protected]> * Deleting now useless file : moved to common/monitoring Signed-off-by: Clement Foyer <[email protected]> * Add histogram ditribution of message sizes Signed-off-by: Clement Foyer <[email protected]> * Add histogram array of 2-based log of message sizes. Use simple call to reset/allocate arrays in common_monitoring.c Signed-off-by: Clement Foyer <[email protected]> * Add informations in dumping file. Separate per category (pt2pt/osc/coll (to come)) monitored data Signed-off-by: Clement Foyer <[email protected]> * Add coll component for collectives communications monitoring Signed-off-by: Clement Foyer <[email protected]> * Fix warning messages : use c_name as the magic id is not always defined. Moreover, there was a % missing. Add call to release underlying modules. Add debug info messages. Add warning which may lead to further analysis. Signed-off-by: Clement Foyer <[email protected]> * Fix log10_2 constant initialization. Fix index calculation for histogram array. Signed-off-by: Clement Foyer <[email protected]> * Add debug info messages to follow more easily initialization steps. Signed-off-by: Clement Foyer <[email protected]> * Group all the var/pvar definitions to common_monitoring. Separate initial filename from the current on, to ease its lifetime management. Add verifications to ensure common is initialized once only. Move state variable management to common_monitoring. monitoring_filter only indicates if filtering is activated. Fix out of range access in histogram. List is not used with the struct mca_monitoring_coll_data_t, so heritate only from opal_object_t. Remove useless dead code. Signed-off-by: Clement Foyer <[email protected]> * Fix invalid memory allocation. Initialize initial_filename to empty string to avoid invalid read in mca_base_var_register. Signed-off-by: Clement Foyer <[email protected]> * Don't install the test scripts. Signed-off-by: George Bosilca <[email protected]> Signed-off-by: Clement Foyer <[email protected]> * Fix missing procs in hashtable. Cache coll monitoring data. * Add MCA_PML_BASE_FLAG_REQUIRE_WORLD flag to the PML layer. * Cache monitoring data relative to collectives operations on creation. * Remove double caching. * Use same proc name definition for hash table when inserting and when retrieving. Signed-off-by: Clement Foyer <[email protected]> * Use intermediate variable to avoid invalid write while retrieving ranks in hashtable. Signed-off-by: Clement Foyer <[email protected]> * Add missing release of the last element in flush_all. Add release of the hashtable in finalize. Signed-off-by: Clement Foyer <[email protected]> * Use a linked list instead of a hashtable to keep tracks of communicator data. Add release of the structure at finalize time. Signed-off-by: Clement Foyer <[email protected]> * Set world_rank from hashtable only if found Signed-off-by: Clement Foyer <[email protected]> * Use predefined symbol from opal system to print int Signed-off-by: Clement Foyer <[email protected]> * Move collective monitoring data to a hashtable. Add pvar to access the monitoring_coll_data. Move functions header to a private file only to be used in ompi/mca/common/monitoring Signed-off-by: Clement Foyer <[email protected]> * Fix pvar registration. Use OMPI_ERROR isntead of -1 as returned error value. Fix releasing of coll_data_t objects. Affect value only if data is found in the hashtable. Signed-off-by: Clement Foyer <[email protected]> * Add automated check (with MPI_Tools) of monitoring. Signed-off-by: Clement Foyer <[email protected]> * Fix procs list caching in common_monitoring_coll_data_t * Fix monitoring_coll_data type definition. * Use size(COMM_WORLD)-1 to determine max number of digits. Signed-off-by: Clement Foyer <[email protected]> * Add linking to Fortran applications for LD_PRELOAD usage of monitoring_prof Signed-off-by: Clement Foyer <[email protected]> * Add PVAR's handles. Clean up code (visibility, add comments...). Start updating the documentation Signed-off-by: Clement Foyer <[email protected]> * Fix coll operations monitoring. Update check_monitoring accordingly to the added pvar. Fix monitoring array allocation. Signed-off-by: Clement Foyer <[email protected]> * Documentation update. Update and then move the latex and README documentation to a more logical place Signed-off-by: Clement Foyer <[email protected]> * Aggregate monitoring COLL data to the generated matrix. Update documentation accordingly. Signed-off-by: Clement Foyer <[email protected]> * Fix monitoring_prof (bad variable.vector used, and wrong array in PMPI_Gather). Signed-off-by: Clement Foyer <[email protected]> * Add reduce_scatter and reduce_scatter_block monitoring. Reduce memory footprint of monitoring_prof. Unify OSC related outputs. Signed-off-by: Clement Foyer <[email protected]> * Add the use of a machine file for overhead benchmark Signed-off-by: Clement Foyer <[email protected]> * Check for out-of-bound write in histogram Signed-off-by: Clement Foyer <[email protected]> * Fix common_monitoring_cache object init for MPI_COMM_WORLD Signed-off-by: Clement Foyer <[email protected]> * Add RDMA benchmarks to test_overhead Add error file output. Add MPI_Put and MPI_Get results analysis. Add overhead computation for complete sending (pingpong / 2). Signed-off-by: Clement Foyer <[email protected]> * Add computation of average and median of overheads. Add comments and copyrigths to the test_overhead script Signed-off-by: Clement Foyer <[email protected]> * Add technical documentation Signed-off-by: Clement Foyer <[email protected]> * Adapt to the new definition of communicators Signed-off-by: Clement Foyer <[email protected]> * Update expected output in test/monitoring/monitoring_test.c Signed-off-by: Clement Foyer <[email protected]> * Add dumping histogram in edge case Signed-off-by: Clement Foyer <[email protected]> * Adding a reduce(pml_monitoring_messages_count, MPI_MAX) example Signed-off-by: Clement Foyer <[email protected]> * Add consistency in header inclusion. Include ompi/mpi/fortran/mpif-h/bindings.h only if needed. Add sanity check before emptying hashtable. Fix typos in documentation. Signed-off-by: Clement Foyer <[email protected]> * misc monitoring fixes * test/monitoring: fix test when weak symbols are not available * monitoring: fix a typo and add a missing file in Makefile.am and have monitoring_common.h and monitoring_common_coll.h included in the distro * test/monitoring: cleanup all tests and make distclean a happy panda * test/monitoring: use gettimeofday() if clock_gettime() is unavailable * monitoring: silence misc warnings (#3) Signed-off-by: Gilles Gouaillardet <[email protected]> * Cleanups. Signed-off-by: George Bosilca <[email protected]> * Changing int64_t to size_t. Keep the size_t used accross all monitoring components. Adapt the documentation. Remove useless MPI_Request and MPI_Status from monitoring_test.c. Signed-off-by: Clement Foyer <[email protected]> * Add parameter for RMA test case Signed-off-by: Clement Foyer <[email protected]> * Clean the maximum bound computation for proc list dump. Use ptrdiff_t instead of OPAL_PTRDIFF_TYPE to reflect the changes from commit fa5cd0d. Signed-off-by: Clement Foyer <[email protected]> * Add communicator-specific monitored collective data reset Signed-off-by: Clement Foyer <[email protected]> * Add monitoring scripts to the 'make dist' Also install them in the build and the install directories. Signed-off-by: George Bosilca <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.