d55b666834
Add a monitoring PML, OSC and IO. They track all data exchanges between processes, with capability to include or exclude collective traffic. The monitoring infrastructure is driven using MPI_T, and can be tuned of and on any time o any communicators/files/windows. Documentations and examples have been added, as well as a shared library that can be used with LD_PRELOAD and that allows the monitoring of any application. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * add ability to querry pml monitorinting results with MPI Tools interface using performance variables "pml_monitoring_messages_count" and "pml_monitoring_messages_size" Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Fix a convertion problem and add a comment about the lack of component retain in the new component infrastructure. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Allow the pvar to be written by invoking the associated callback. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Various fixes for the monitoring. Allocate all counting arrays in a single allocation Don't delay the initialization (do it at the first add_proc as we know the number of processes in MPI_COMM_WORLD) Add a choice: with or without MPI_T (default). Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Cleanup for the monitoring module. Fixed few bugs, and reshape the operations to prepare for global or communicator-based monitoring. Start integrating support for MPI_T as well as MCA monitoring. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Adding documentation about how to use pml_monitoring component. Document present the use with and without MPI_T. May not reflect exactly how it works right now, but should reflects how it should work in the end. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Change rank into MPI_COMM_WORLD and size(MPI_COMM_WORLD) to global variables in pml_monitoring.c. Change mca_pml_monitoring_flush() signature so we don't need the size and rank parameters. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Improve monitoring support (including integration with MPI_T) Use mca_pml_monitoring_enable to check status state. Set mca_pml_monitoring_current_filename iif parameter is set Allow 3 modes for pml_monitoring_enable_output: - 1 : stdout; - 2 : stderr; - 3 : filename Fix test : 1 for differenciated messages, >1 for not differenciated. Fix output. Add documentation for pml_monitoring_enable_output parameter. Remove useless parameter in example Set filename only if using mpi tools Adding missing parameters for fprintf in monitoring_flush (for output in std's cases) Fix expected output/results for example header Fix exemple when using MPI_Tools : a null-pointer can't be passed directly. It needs to be a pointer to a null-pointer Base whether to output or not on message count, in order to print something if only empty messages are exchanged Add a new example on how to access performance variables from within the code Allocate arrays regarding value returned by binding Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add overhead benchmark, with script to use data and create graphs out of the results Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix segfault error at end when not loading pml Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Start create common monitoring module. Factorise version numbering Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix microbenchmarks script Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Improve readability of code NULL can't be passed as a PVAR parameter value. It must be a pointer to NULL or an empty string. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add osc monitoring component Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add error checking if running out of memory in osc_monitoring Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Resolve brutal segfault when double freeing filename Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Moving to ompi/mca/common the proper parts of the monitoring system Using common functions instead of pml specific one. Removing pml ones. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add calls to record monitored data from osc. Use common function to translate ranks. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix test_overhead benchmark script distribution Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix linking library with mca/common Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add passive operations in monitoring_test Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix from rank calculation. Add more detailed error messages Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix alignments. Fix common_monitoring_get_world_rank function. Remove useless trailing new lines Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix osc_monitoring mget_message_count function call Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Change common_monitoring function names to respect the naming convention. Move to common_finalize the common parts of finalization. Add some comments. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add monitoring common output system Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add error message when trying to flush to a file, and open fails. Remove erroneous info message when flushing wereas the monitoring is already disabled. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Consistent output file name (with and without MPI_T). Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Always output to a file when flushing at pvar_stop(flush). Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Update the monitoring documentation. Complete informations from HowTo. Fix a few mistake and typos. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Use the world_rank for printf's. Fix name generation for output files when using MPI_T. Minor changes in benchmarks starting script Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Clean potential previous runs, but keep the results at the end in order to potentially reprocess the data. Add comments. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add security check for unique initialization for osc monitoring Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Clean the amout of symbols available outside mca/common/monitoring Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Remove use of __sync_* built-ins. Use opal_atomic_* instead. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Allocate the hashtable on common/monitoring component initialization. Define symbols to set the values for error/warning/info verbose output. Use opal_atomic instead of built-in function in osc/monitoring template initialization. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Deleting now useless file : moved to common/monitoring Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add histogram ditribution of message sizes Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add histogram array of 2-based log of message sizes. Use simple call to reset/allocate arrays in common_monitoring.c Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add informations in dumping file. Separate per category (pt2pt/osc/coll (to come)) monitored data Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add coll component for collectives communications monitoring Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix warning messages : use c_name as the magic id is not always defined. Moreover, there was a % missing. Add call to release underlying modules. Add debug info messages. Add warning which may lead to further analysis. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix log10_2 constant initialization. Fix index calculation for histogram array. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add debug info messages to follow more easily initialization steps. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Group all the var/pvar definitions to common_monitoring. Separate initial filename from the current on, to ease its lifetime management. Add verifications to ensure common is initialized once only. Move state variable management to common_monitoring. monitoring_filter only indicates if filtering is activated. Fix out of range access in histogram. List is not used with the struct mca_monitoring_coll_data_t, so heritate only from opal_object_t. Remove useless dead code. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix invalid memory allocation. Initialize initial_filename to empty string to avoid invalid read in mca_base_var_register. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Don't install the test scripts. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix missing procs in hashtable. Cache coll monitoring data. * Add MCA_PML_BASE_FLAG_REQUIRE_WORLD flag to the PML layer. * Cache monitoring data relative to collectives operations on creation. * Remove double caching. * Use same proc name definition for hash table when inserting and when retrieving. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Use intermediate variable to avoid invalid write while retrieving ranks in hashtable. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add missing release of the last element in flush_all. Add release of the hashtable in finalize. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Use a linked list instead of a hashtable to keep tracks of communicator data. Add release of the structure at finalize time. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Set world_rank from hashtable only if found Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Use predefined symbol from opal system to print int Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Move collective monitoring data to a hashtable. Add pvar to access the monitoring_coll_data. Move functions header to a private file only to be used in ompi/mca/common/monitoring Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix pvar registration. Use OMPI_ERROR isntead of -1 as returned error value. Fix releasing of coll_data_t objects. Affect value only if data is found in the hashtable. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add automated check (with MPI_Tools) of monitoring. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix procs list caching in common_monitoring_coll_data_t * Fix monitoring_coll_data type definition. * Use size(COMM_WORLD)-1 to determine max number of digits. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add linking to Fortran applications for LD_PRELOAD usage of monitoring_prof Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add PVAR's handles. Clean up code (visibility, add comments...). Start updating the documentation Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix coll operations monitoring. Update check_monitoring accordingly to the added pvar. Fix monitoring array allocation. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Documentation update. Update and then move the latex and README documentation to a more logical place Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Aggregate monitoring COLL data to the generated matrix. Update documentation accordingly. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix monitoring_prof (bad variable.vector used, and wrong array in PMPI_Gather). Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add reduce_scatter and reduce_scatter_block monitoring. Reduce memory footprint of monitoring_prof. Unify OSC related outputs. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add the use of a machine file for overhead benchmark Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Check for out-of-bound write in histogram Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Fix common_monitoring_cache object init for MPI_COMM_WORLD Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add RDMA benchmarks to test_overhead Add error file output. Add MPI_Put and MPI_Get results analysis. Add overhead computation for complete sending (pingpong / 2). Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add computation of average and median of overheads. Add comments and copyrigths to the test_overhead script Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add technical documentation Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Adapt to the new definition of communicators Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Update expected output in test/monitoring/monitoring_test.c Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add dumping histogram in edge case Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Adding a reduce(pml_monitoring_messages_count, MPI_MAX) example Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add consistency in header inclusion. Include ompi/mpi/fortran/mpif-h/bindings.h only if needed. Add sanity check before emptying hashtable. Fix typos in documentation. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * misc monitoring fixes * test/monitoring: fix test when weak symbols are not available * monitoring: fix a typo and add a missing file in Makefile.am and have monitoring_common.h and monitoring_common_coll.h included in the distro * test/monitoring: cleanup all tests and make distclean a happy panda * test/monitoring: use gettimeofday() if clock_gettime() is unavailable * monitoring: silence misc warnings (#3) Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> * Cleanups. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Changing int64_t to size_t. Keep the size_t used accross all monitoring components. Adapt the documentation. Remove useless MPI_Request and MPI_Status from monitoring_test.c. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add parameter for RMA test case Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Clean the maximum bound computation for proc list dump. Use ptrdiff_t instead of OPAL_PTRDIFF_TYPE to reflect the changes from commit fa5cd0dbe5d261bd9d2cc61d5b305b4ef6a2dda6. Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add communicator-specific monitored collective data reset Signed-off-by: Clement Foyer <clement.foyer@inria.fr> * Add monitoring scripts to the 'make dist' Also install them in the build and the install directories. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
379 строки
14 KiB
C
379 строки
14 KiB
C
/*
|
|
* Copyright (c) 2013-2015 The University of Tennessee and The University
|
|
* of Tennessee Research Foundation. All rights
|
|
* reserved.
|
|
* Copyright (c) 2013-2017 Inria. All rights reserved.
|
|
* Copyright (c) 2015 Cisco Systems, Inc. All rights reserved.
|
|
* Copyright (c) 2016 Intel, Inc. All rights reserved.
|
|
* $COPYRIGHT$
|
|
*
|
|
* Additional copyrights may follow
|
|
*
|
|
* $HEADER$
|
|
*/
|
|
|
|
/*
|
|
pml monitoring tester.
|
|
|
|
Designed by George Bosilca <bosilca@icl.utk.edu> Emmanuel Jeannot <emmanuel.jeannot@inria.fr> and Clément Foyer <clement.foyer@inria.fr>
|
|
Contact the authors for questions.
|
|
|
|
To options are available for this test, with/without MPI_Tools, and with/without RMA operations. The default mode is without MPI_Tools, and with RMA operations.
|
|
To enable the MPI_Tools use, add "--with-mpit" as an application parameter.
|
|
To disable the RMA operations testing, add "--without-rma" as an application parameter.
|
|
|
|
To be run as (without using MPI_Tool):
|
|
|
|
mpirun -np 4 --mca pml_monitoring_enable 2 --mca pml_monitoring_enable_output 3 --mca pml_monitoring_filename prof/output ./monitoring_test
|
|
|
|
with the results being, as an example:
|
|
output.1.prof
|
|
# POINT TO POINT
|
|
E 1 2 104 bytes 26 msgs sent 0,0,0,26,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
|
|
E 1 3 208 bytes 52 msgs sent 8,0,0,65,1,5,2,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
|
|
I 1 0 140 bytes 27 msgs sent
|
|
I 1 2 2068 bytes 1 msgs sent
|
|
I 1 3 2256 bytes 31 msgs sent
|
|
# OSC
|
|
S 1 0 0 bytes 1 msgs sent
|
|
R 1 0 40960 bytes 1 msgs sent
|
|
S 1 2 40960 bytes 1 msgs sent
|
|
# COLLECTIVES
|
|
C 1 0 140 bytes 27 msgs sent
|
|
C 1 2 140 bytes 27 msgs sent
|
|
C 1 3 140 bytes 27 msgs sent
|
|
D MPI COMMUNICATOR 4 DUP FROM 0 procs: 0,1,2,3
|
|
O2A 1 0 bytes 0 msgs sent
|
|
A2O 1 0 bytes 0 msgs sent
|
|
A2A 1 276 bytes 15 msgs sent
|
|
D MPI_COMM_WORLD procs: 0,1,2,3
|
|
O2A 1 0 bytes 0 msgs sent
|
|
A2O 1 0 bytes 0 msgs sent
|
|
A2A 1 96 bytes 9 msgs sent
|
|
D MPI COMMUNICATOR 5 SPLIT_TYPE FROM 4 procs: 0,1,2,3
|
|
O2A 1 0 bytes 0 msgs sent
|
|
A2O 1 0 bytes 0 msgs sent
|
|
A2A 1 48 bytes 3 msgs sent
|
|
D MPI COMMUNICATOR 3 SPLIT FROM 0 procs: 1,3
|
|
O2A 1 0 bytes 0 msgs sent
|
|
A2O 1 0 bytes 0 msgs sent
|
|
A2A 1 0 bytes 0 msgs sent
|
|
|
|
*/
|
|
|
|
|
|
#include "mpi.h"
|
|
#include <stdio.h>
|
|
#include <string.h>
|
|
|
|
static MPI_T_pvar_handle flush_handle;
|
|
static const char flush_pvar_name[] = "pml_monitoring_flush";
|
|
static const void*nullbuf = NULL;
|
|
static int flush_pvar_idx;
|
|
static int with_mpit = 0;
|
|
static int with_rma = 1;
|
|
|
|
int main(int argc, char* argv[])
|
|
{
|
|
int rank, size, n, to, from, tagno, MPIT_result, provided, count, world_rank;
|
|
MPI_T_pvar_session session;
|
|
MPI_Comm newcomm;
|
|
char filename[1024];
|
|
|
|
for ( int arg_it = 1; argc > 1 && arg_it < argc; ++arg_it ) {
|
|
if( 0 == strcmp(argv[arg_it], "--with-mpit") ) {
|
|
with_mpit = 1;
|
|
printf("enable MPIT support\n");
|
|
} else if( 0 == strcmp(argv[arg_it], "--without-rma") ) {
|
|
with_rma = 0;
|
|
printf("disable RMA testing\n");
|
|
}
|
|
}
|
|
|
|
/* first phase : make a token circulated in MPI_COMM_WORLD */
|
|
n = -1;
|
|
MPI_Init(NULL, NULL);
|
|
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
|
|
MPI_Comm_size(MPI_COMM_WORLD, &size);
|
|
rank = world_rank;
|
|
to = (rank + 1) % size;
|
|
from = (rank - 1) % size;
|
|
tagno = 201;
|
|
|
|
if( with_mpit ) {
|
|
MPIT_result = MPI_T_init_thread(MPI_THREAD_SINGLE, &provided);
|
|
if (MPIT_result != MPI_SUCCESS)
|
|
MPI_Abort(MPI_COMM_WORLD, MPIT_result);
|
|
|
|
MPIT_result = MPI_T_pvar_get_index(flush_pvar_name, MPI_T_PVAR_CLASS_GENERIC, &flush_pvar_idx);
|
|
if (MPIT_result != MPI_SUCCESS) {
|
|
printf("cannot find monitoring MPI_T \"%s\" pvar, check that you have monitoring pml\n",
|
|
flush_pvar_name);
|
|
MPI_Abort(MPI_COMM_WORLD, MPIT_result);
|
|
}
|
|
|
|
MPIT_result = MPI_T_pvar_session_create(&session);
|
|
if (MPIT_result != MPI_SUCCESS) {
|
|
printf("cannot create a session for \"%s\" pvar\n", flush_pvar_name);
|
|
MPI_Abort(MPI_COMM_WORLD, MPIT_result);
|
|
}
|
|
|
|
/* Allocating a new PVAR in a session will reset the counters */
|
|
MPIT_result = MPI_T_pvar_handle_alloc(session, flush_pvar_idx,
|
|
MPI_COMM_WORLD, &flush_handle, &count);
|
|
if (MPIT_result != MPI_SUCCESS) {
|
|
printf("failed to allocate handle on \"%s\" pvar, check that you have monitoring pml\n",
|
|
flush_pvar_name);
|
|
MPI_Abort(MPI_COMM_WORLD, MPIT_result);
|
|
}
|
|
|
|
MPIT_result = MPI_T_pvar_start(session, flush_handle);
|
|
if (MPIT_result != MPI_SUCCESS) {
|
|
printf("failed to start handle on \"%s\" pvar, check that you have monitoring pml\n",
|
|
flush_pvar_name);
|
|
MPI_Abort(MPI_COMM_WORLD, MPIT_result);
|
|
}
|
|
}
|
|
|
|
if (rank == 0) {
|
|
n = 25;
|
|
MPI_Send(&n,1,MPI_INT,to,tagno,MPI_COMM_WORLD);
|
|
}
|
|
while (1) {
|
|
MPI_Recv(&n, 1, MPI_INT, from, tagno, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
|
|
if (rank == 0) {n--;tagno++;}
|
|
MPI_Send(&n, 1, MPI_INT, to, tagno, MPI_COMM_WORLD);
|
|
if (rank != 0) {n--;tagno++;}
|
|
if (n<0){
|
|
break;
|
|
}
|
|
}
|
|
|
|
if( with_mpit ) {
|
|
/* Build one file per processes
|
|
Every thing that has been monitored by each
|
|
process since the last flush will be output in filename */
|
|
/*
|
|
Requires directory prof to be created.
|
|
Filename format should display the phase number
|
|
and the process rank for ease of parsing with
|
|
aggregate_profile.pl script
|
|
*/
|
|
sprintf(filename, "prof/phase_1");
|
|
|
|
if( MPI_SUCCESS != MPI_T_pvar_write(session, flush_handle, filename) ) {
|
|
fprintf(stderr, "Process %d cannot save monitoring in %s.%d.prof\n",
|
|
world_rank, filename, world_rank);
|
|
}
|
|
/* Force the writing of the monitoring data */
|
|
MPIT_result = MPI_T_pvar_stop(session, flush_handle);
|
|
if (MPIT_result != MPI_SUCCESS) {
|
|
printf("failed to stop handle on \"%s\" pvar, check that you have monitoring pml\n",
|
|
flush_pvar_name);
|
|
MPI_Abort(MPI_COMM_WORLD, MPIT_result);
|
|
}
|
|
|
|
MPIT_result = MPI_T_pvar_start(session, flush_handle);
|
|
if (MPIT_result != MPI_SUCCESS) {
|
|
printf("failed to start handle on \"%s\" pvar, check that you have monitoring pml\n",
|
|
flush_pvar_name);
|
|
MPI_Abort(MPI_COMM_WORLD, MPIT_result);
|
|
}
|
|
/* Don't set a filename. If we stop the session before setting it, then no output file
|
|
* will be generated.
|
|
*/
|
|
if( MPI_SUCCESS != MPI_T_pvar_write(session, flush_handle, (void*)&nullbuf) ) {
|
|
fprintf(stderr, "Process %d cannot save monitoring in %s\n", world_rank, filename);
|
|
}
|
|
}
|
|
|
|
/*
|
|
Second phase. Work with different communicators.
|
|
even ranks will circulate a token
|
|
while odd ranks will perform a all_to_all
|
|
*/
|
|
MPI_Comm_split(MPI_COMM_WORLD, rank%2, rank, &newcomm);
|
|
|
|
if(rank%2){ /*odd ranks (in COMM_WORD) circulate a token*/
|
|
MPI_Comm_rank(newcomm, &rank);
|
|
MPI_Comm_size(newcomm, &size);
|
|
if( size > 1 ) {
|
|
to = (rank + 1) % size;
|
|
from = (rank - 1) % size;
|
|
tagno = 201;
|
|
if (rank == 0){
|
|
n = 50;
|
|
MPI_Send(&n, 1, MPI_INT, to, tagno, newcomm);
|
|
}
|
|
while (1){
|
|
MPI_Recv(&n, 1, MPI_INT, from, tagno, newcomm, MPI_STATUS_IGNORE);
|
|
if (rank == 0) {n--; tagno++;}
|
|
MPI_Send(&n, 1, MPI_INT, to, tagno, newcomm);
|
|
if (rank != 0) {n--; tagno++;}
|
|
if (n<0){
|
|
break;
|
|
}
|
|
}
|
|
}
|
|
} else { /*even ranks (in COMM_WORD) will perform a all_to_all and a barrier*/
|
|
int send_buff[10240];
|
|
int recv_buff[10240];
|
|
MPI_Comm newcomm2;
|
|
MPI_Comm_rank(newcomm, &rank);
|
|
MPI_Comm_size(newcomm, &size);
|
|
MPI_Alltoall(send_buff, 10240/size, MPI_INT, recv_buff, 10240/size, MPI_INT, newcomm);
|
|
MPI_Comm_split(newcomm, rank%2, rank, &newcomm2);
|
|
MPI_Barrier(newcomm2);
|
|
MPI_Comm_free(&newcomm2);
|
|
}
|
|
|
|
if( with_mpit ) {
|
|
/* Build one file per processes
|
|
Every thing that has been monitored by each
|
|
process since the last flush will be output in filename */
|
|
/*
|
|
Requires directory prof to be created.
|
|
Filename format should display the phase number
|
|
and the process rank for ease of parsing with
|
|
aggregate_profile.pl script
|
|
*/
|
|
sprintf(filename, "prof/phase_2");
|
|
|
|
if( MPI_SUCCESS != MPI_T_pvar_write(session, flush_handle, filename) ) {
|
|
fprintf(stderr, "Process %d cannot save monitoring in %s.%d.prof\n",
|
|
world_rank, filename, world_rank);
|
|
}
|
|
|
|
/* Force the writing of the monitoring data */
|
|
MPIT_result = MPI_T_pvar_stop(session, flush_handle);
|
|
if (MPIT_result != MPI_SUCCESS) {
|
|
printf("failed to stop handle on \"%s\" pvar, check that you have monitoring pml\n",
|
|
flush_pvar_name);
|
|
MPI_Abort(MPI_COMM_WORLD, MPIT_result);
|
|
}
|
|
|
|
MPIT_result = MPI_T_pvar_start(session, flush_handle);
|
|
if (MPIT_result != MPI_SUCCESS) {
|
|
printf("failed to start handle on \"%s\" pvar, check that you have monitoring pml\n",
|
|
flush_pvar_name);
|
|
MPI_Abort(MPI_COMM_WORLD, MPIT_result);
|
|
}
|
|
/* Don't set a filename. If we stop the session before setting it, then no output
|
|
* will be generated.
|
|
*/
|
|
if( MPI_SUCCESS != MPI_T_pvar_write(session, flush_handle, (void*)&nullbuf ) ) {
|
|
fprintf(stderr, "Process %d cannot save monitoring in %s\n", world_rank, filename);
|
|
}
|
|
}
|
|
|
|
if( with_rma ) {
|
|
MPI_Win win;
|
|
int rs_buff[10240];
|
|
int win_buff[10240];
|
|
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
|
|
MPI_Comm_size(MPI_COMM_WORLD, &size);
|
|
to = (rank + 1) % size;
|
|
from = (rank + size - 1) % size;
|
|
for( int v = 0; v < 10240; ++v )
|
|
rs_buff[v] = win_buff[v] = rank;
|
|
|
|
MPI_Win_create(win_buff, 10240 * sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &win);
|
|
MPI_Win_fence(MPI_MODE_NOPRECEDE, win);
|
|
if( rank%2 ) {
|
|
MPI_Win_fence(MPI_MODE_NOSTORE | MPI_MODE_NOPUT, win);
|
|
MPI_Get(rs_buff, 10240, MPI_INT, from, 0, 10240, MPI_INT, win);
|
|
} else {
|
|
MPI_Put(rs_buff, 10240, MPI_INT, to, 0, 10240, MPI_INT, win);
|
|
MPI_Win_fence(MPI_MODE_NOSTORE | MPI_MODE_NOPUT, win);
|
|
}
|
|
MPI_Win_fence(MPI_MODE_NOSUCCEED, win);
|
|
|
|
for( int v = 0; v < 10240; ++v )
|
|
if( rs_buff[v] != win_buff[v] && ((rank%2 && rs_buff[v] != from) || (!(rank%2) && rs_buff[v] != rank)) ) {
|
|
printf("Error on checking exchanged values: %s_buff[%d] == %d instead of %d\n",
|
|
rank%2 ? "rs" : "win", v, rs_buff[v], rank%2 ? from : rank);
|
|
MPI_Abort(MPI_COMM_WORLD, -1);
|
|
}
|
|
|
|
MPI_Group world_group, newcomm_group, distant_group;
|
|
MPI_Comm_group(MPI_COMM_WORLD, &world_group);
|
|
MPI_Comm_group(newcomm, &newcomm_group);
|
|
MPI_Group_difference(world_group, newcomm_group, &distant_group);
|
|
if( rank%2 ) {
|
|
MPI_Win_post(distant_group, 0, win);
|
|
MPI_Win_wait(win);
|
|
/* Check recieved values */
|
|
for( int v = 0; v < 10240; ++v )
|
|
if( from != win_buff[v] ) {
|
|
printf("Error on checking exchanged values: win_buff[%d] == %d instead of %d\n",
|
|
v, win_buff[v], from);
|
|
MPI_Abort(MPI_COMM_WORLD, -1);
|
|
}
|
|
} else {
|
|
MPI_Win_start(distant_group, 0, win);
|
|
MPI_Put(rs_buff, 10240, MPI_INT, to, 0, 10240, MPI_INT, win);
|
|
MPI_Win_complete(win);
|
|
}
|
|
MPI_Group_free(&world_group);
|
|
MPI_Group_free(&newcomm_group);
|
|
MPI_Group_free(&distant_group);
|
|
MPI_Barrier(MPI_COMM_WORLD);
|
|
|
|
for( int v = 0; v < 10240; ++v ) rs_buff[v] = rank;
|
|
|
|
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, to, 0, win);
|
|
MPI_Put(rs_buff, 10240, MPI_INT, to, 0, 10240, MPI_INT, win);
|
|
MPI_Win_unlock(to, win);
|
|
|
|
MPI_Barrier(MPI_COMM_WORLD);
|
|
|
|
/* Check recieved values */
|
|
for( int v = 0; v < 10240; ++v )
|
|
if( from != win_buff[v] ) {
|
|
printf("Error on checking exchanged values: win_buff[%d] == %d instead of %d\n",
|
|
v, win_buff[v], from);
|
|
MPI_Abort(MPI_COMM_WORLD, -1);
|
|
}
|
|
|
|
MPI_Win_free(&win);
|
|
}
|
|
|
|
if( with_mpit ) {
|
|
/* the filename for flushing monitoring now uses 3 as phase number! */
|
|
sprintf(filename, "prof/phase_3");
|
|
|
|
if( MPI_SUCCESS != MPI_T_pvar_write(session, flush_handle, filename) ) {
|
|
fprintf(stderr, "Process %d cannot save monitoring in %s.%d.prof\n",
|
|
world_rank, filename, world_rank);
|
|
}
|
|
|
|
MPIT_result = MPI_T_pvar_stop(session, flush_handle);
|
|
if (MPIT_result != MPI_SUCCESS) {
|
|
printf("failed to stop handle on \"%s\" pvar, check that you have monitoring pml\n",
|
|
flush_pvar_name);
|
|
MPI_Abort(MPI_COMM_WORLD, MPIT_result);
|
|
}
|
|
|
|
MPIT_result = MPI_T_pvar_handle_free(session, &flush_handle);
|
|
if (MPIT_result != MPI_SUCCESS) {
|
|
printf("failed to free handle on \"%s\" pvar, check that you have monitoring pml\n",
|
|
flush_pvar_name);
|
|
MPI_Abort(MPI_COMM_WORLD, MPIT_result);
|
|
}
|
|
|
|
MPIT_result = MPI_T_pvar_session_free(&session);
|
|
if (MPIT_result != MPI_SUCCESS) {
|
|
printf("cannot close a session for \"%s\" pvar\n", flush_pvar_name);
|
|
MPI_Abort(MPI_COMM_WORLD, MPIT_result);
|
|
}
|
|
|
|
(void)MPI_T_finalize();
|
|
}
|
|
|
|
MPI_Comm_free(&newcomm);
|
|
/* Now, in MPI_Finalize(), the pml_monitoring library outputs, in
|
|
STDERR, the aggregated recorded monitoring of all the phases*/
|
|
MPI_Finalize();
|
|
return 0;
|
|
}
|