1
1
openmpi/test/monitoring/check_monitoring.c
bosilca d55b666834 Topic/monitoring (#3109)
Add a monitoring PML, OSC and IO. They track all data exchanges between processes,
with capability to include or exclude collective traffic. The monitoring infrastructure is
driven using MPI_T, and can be tuned of and on any time o any communicators/files/windows.
Documentations and examples have been added, as well as a shared library that can be
used with LD_PRELOAD and that allows the monitoring of any application.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>


* add ability to querry pml monitorinting results with MPI Tools interface
using performance variables "pml_monitoring_messages_count" and
"pml_monitoring_messages_size"

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Fix a convertion problem and add a comment about the lack of component
retain in the new component infrastructure.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Allow the pvar to be written by invoking the associated callback.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Various fixes for the monitoring.
Allocate all counting arrays in a single allocation
Don't delay the initialization (do it at the first add_proc as we
know the number of processes in MPI_COMM_WORLD)

Add a choice: with or without MPI_T (default).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Cleanup for the monitoring module.
Fixed few bugs, and reshape the operations to prepare for
global or communicator-based monitoring. Start integrating
support for MPI_T as well as MCA monitoring.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Adding documentation about how to use pml_monitoring component.

Document present the use with and without MPI_T.
May not reflect exactly how it works right now, but should reflects
how it should work in the end.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Change rank into MPI_COMM_WORLD and size(MPI_COMM_WORLD) to global variables in pml_monitoring.c.
Change mca_pml_monitoring_flush() signature so we don't need the size and rank parameters.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Improve monitoring support (including integration with MPI_T)

Use mca_pml_monitoring_enable to check status state. Set mca_pml_monitoring_current_filename iif parameter is set
Allow 3 modes for pml_monitoring_enable_output: - 1 : stdout; - 2 : stderr; - 3 : filename
Fix test : 1 for differenciated messages, >1 for not differenciated. Fix output.
Add documentation for pml_monitoring_enable_output parameter. Remove useless parameter in example
Set filename only if using mpi tools
Adding missing parameters for fprintf in monitoring_flush (for output in std's cases)
Fix expected output/results for example header
Fix exemple when using MPI_Tools : a null-pointer can't be passed directly. It needs to be a pointer to a null-pointer
Base whether to output or not on message count, in order to print something if only empty messages are exchanged
Add a new example on how to access performance variables from within the code
Allocate arrays regarding value returned by binding

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add overhead benchmark, with script to use data and create graphs out of the results
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix segfault error at end when not loading pml
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Start create common monitoring module. Factorise version numbering
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix microbenchmarks script
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Improve readability of code

NULL can't be passed as a PVAR parameter value. It must be a pointer to NULL or an empty string.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add osc monitoring component

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add error checking if running out of memory in osc_monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Resolve brutal segfault when double freeing filename
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Moving to ompi/mca/common the proper parts of the monitoring system
Using common functions instead of pml specific one. Removing pml ones.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add calls to record monitored data from osc. Use common function to translate ranks.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix test_overhead benchmark script distribution

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix linking library with mca/common

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add passive operations in monitoring_test

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix from rank calculation. Add more detailed error messages

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix alignments. Fix common_monitoring_get_world_rank function. Remove useless trailing new lines

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix osc_monitoring mget_message_count function call

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Change common_monitoring function names to respect the naming convention. Move to common_finalize the common parts of finalization. Add some comments.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add monitoring common output system

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add error message when trying to flush to a file, and open fails. Remove erroneous info message when flushing wereas the monitoring is already disabled.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Consistent output file name (with and without MPI_T).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Always output to a file when flushing at pvar_stop(flush).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Update the monitoring documentation.
Complete informations from HowTo. Fix a few mistake and typos.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use the world_rank for printf's.
Fix name generation for output files when using MPI_T. Minor changes in benchmarks starting script

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Clean potential previous runs, but keep the results at the end in order to potentially reprocess the data. Add comments.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add security check for unique initialization for osc monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Clean the amout of symbols available outside mca/common/monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Remove use of __sync_* built-ins. Use opal_atomic_* instead.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Allocate the hashtable on common/monitoring component initialization. Define symbols to set the values for error/warning/info verbose output. Use opal_atomic instead of built-in function in osc/monitoring template initialization.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Deleting now useless file : moved to common/monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add histogram ditribution of message sizes

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add histogram array of 2-based log of message sizes. Use simple call to reset/allocate arrays in common_monitoring.c

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add informations in dumping file. Separate per category (pt2pt/osc/coll (to come)) monitored data

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add coll component for collectives communications monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix warning messages : use c_name as the magic id is not always defined. Moreover, there was a % missing. Add call to release underlying modules. Add debug info messages. Add warning which may lead to further analysis.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix log10_2 constant initialization. Fix index calculation for histogram array.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add debug info messages to follow more easily initialization steps.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Group all the var/pvar definitions to common_monitoring. Separate initial filename from the current on, to ease its lifetime management. Add verifications to ensure common is initialized once only. Move state variable management to common_monitoring.
monitoring_filter only indicates if filtering is activated.
Fix out of range access in histogram.
List is not used with the struct mca_monitoring_coll_data_t, so heritate only from opal_object_t.
Remove useless dead code.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix invalid memory allocation. Initialize initial_filename to empty string to avoid invalid read in mca_base_var_register.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Don't install the test scripts.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix missing procs in hashtable. Cache coll monitoring data.
    * Add MCA_PML_BASE_FLAG_REQUIRE_WORLD flag to the PML layer.
    * Cache monitoring data relative to collectives operations on creation.
    * Remove double caching.
    * Use same proc name definition for hash table when inserting and
      when retrieving.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use intermediate variable to avoid invalid write while retrieving ranks in hashtable.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add missing release of the last element in flush_all. Add release of the hashtable in finalize.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use a linked list instead of a hashtable to keep tracks of communicator data. Add release of the structure at finalize time.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Set world_rank from hashtable only if found

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Use predefined symbol from opal system to print int

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Move collective monitoring data to a hashtable. Add pvar to access the monitoring_coll_data. Move functions header to a private file only to be used in ompi/mca/common/monitoring

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix pvar registration. Use OMPI_ERROR isntead of -1 as returned error value. Fix releasing of coll_data_t objects. Affect value only if data is found in the hashtable.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add automated check (with MPI_Tools) of monitoring.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix procs list caching in common_monitoring_coll_data_t

    * Fix monitoring_coll_data type definition.
    * Use size(COMM_WORLD)-1 to determine max number of digits.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add linking to Fortran applications for LD_PRELOAD usage of monitoring_prof

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add PVAR's handles. Clean up code (visibility, add comments...). Start updating the documentation

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix coll operations monitoring. Update check_monitoring accordingly to the added pvar. Fix monitoring array allocation.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Documentation update.
Update and then move the latex and README documentation to a more logical place

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Aggregate monitoring COLL data to the generated matrix. Update documentation accordingly.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix monitoring_prof (bad variable.vector used, and wrong array in PMPI_Gather).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add reduce_scatter and reduce_scatter_block monitoring. Reduce memory footprint of monitoring_prof. Unify OSC related outputs.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add the use of a machine file for overhead benchmark

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Check for out-of-bound write in histogram

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Fix common_monitoring_cache object init for MPI_COMM_WORLD

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add RDMA benchmarks to test_overhead
Add error file output. Add MPI_Put and MPI_Get results analysis. Add overhead computation for complete sending (pingpong / 2).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add computation of average and median of overheads. Add comments and copyrigths to the test_overhead script

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add technical documentation

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Adapt to the new definition of communicators

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Update expected output in test/monitoring/monitoring_test.c

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add dumping histogram in edge case

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Adding a reduce(pml_monitoring_messages_count, MPI_MAX) example

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add consistency in header inclusion.
Include ompi/mpi/fortran/mpif-h/bindings.h only if needed.
Add sanity check before emptying hashtable.
Fix typos in documentation.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* misc monitoring fixes

* test/monitoring: fix test when weak symbols are not available
* monitoring: fix a typo and add a missing file in Makefile.am
and have monitoring_common.h and monitoring_common_coll.h included in the distro
* test/monitoring: cleanup all tests and make distclean a happy panda
* test/monitoring: use gettimeofday() if clock_gettime() is unavailable
* monitoring: silence misc warnings (#3)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

* Cleanups.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

* Changing int64_t to size_t.
Keep the size_t used accross all monitoring components.
Adapt the documentation.
Remove useless MPI_Request and MPI_Status from monitoring_test.c.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add parameter for RMA test case

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Clean the maximum bound computation for proc list dump.
Use ptrdiff_t instead of OPAL_PTRDIFF_TYPE to reflect the changes from commit fa5cd0dbe5.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add communicator-specific monitored collective data reset

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>

* Add monitoring scripts to the 'make dist'
Also install them in the build and the install directories.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-06-26 18:21:39 +02:00

517 строки
24 KiB
C

/*
* Copyright (c) 2016-2017 Inria. All rights reserved.
* Copyright (c) 2017 The University of Tennessee and The University
* of Tennessee Research Foundation. All rights
* reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
/*
Check the well working of the monitoring component for Open-MPI.
To be run as:
mpirun -np 4 --mca pml_monitoring_enable 2 ./check_monitoring
*/
#include <mpi.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define PVAR_GENERATE_VARIABLES(pvar_prefix, pvar_name, pvar_class) \
/* Variables */ \
static MPI_T_pvar_handle pvar_prefix ## _handle; \
static const char pvar_prefix ## _pvar_name[] = pvar_name; \
static int pvar_prefix ## _pvar_idx; \
/* Functions */ \
static inline int pvar_prefix ## _start(MPI_T_pvar_session session) \
{ \
int MPIT_result; \
MPIT_result = MPI_T_pvar_start(session, pvar_prefix ## _handle); \
if( MPI_SUCCESS != MPIT_result ) { \
fprintf(stderr, "Failed to start handle on \"%s\" pvar, check that you have " \
"enabled the monitoring component.\n", pvar_prefix ## _pvar_name); \
MPI_Abort(MPI_COMM_WORLD, MPIT_result); \
} \
return MPIT_result; \
} \
static inline int pvar_prefix ## _init(MPI_T_pvar_session session) \
{ \
int MPIT_result; \
/* Get index */ \
MPIT_result = MPI_T_pvar_get_index(pvar_prefix ## _pvar_name, \
pvar_class, \
&(pvar_prefix ## _pvar_idx)); \
if( MPI_SUCCESS != MPIT_result ) { \
fprintf(stderr, "Cannot find monitoring MPI_Tool \"%s\" pvar, check that you have " \
"enabled the monitoring component.\n", pvar_prefix ## _pvar_name); \
MPI_Abort(MPI_COMM_WORLD, MPIT_result); \
return MPIT_result; \
} \
/* Allocate handle */ \
/* Allocating a new PVAR in a session will reset the counters */ \
int count; \
MPIT_result = MPI_T_pvar_handle_alloc(session, pvar_prefix ## _pvar_idx, \
MPI_COMM_WORLD, &(pvar_prefix ## _handle), \
&count); \
if( MPI_SUCCESS != MPIT_result ) { \
fprintf(stderr, "Failed to allocate handle on \"%s\" pvar, check that you have " \
"enabled the monitoring component.\n", pvar_prefix ## _pvar_name); \
MPI_Abort(MPI_COMM_WORLD, MPIT_result); \
return MPIT_result; \
} \
/* Start PVAR */ \
return pvar_prefix ## _start(session); \
} \
static inline int pvar_prefix ## _stop(MPI_T_pvar_session session) \
{ \
int MPIT_result; \
MPIT_result = MPI_T_pvar_stop(session, pvar_prefix ## _handle); \
if( MPI_SUCCESS != MPIT_result ) { \
fprintf(stderr, "Failed to stop handle on \"%s\" pvar, check that you have " \
"enabled the monitoring component.\n", pvar_prefix ## _pvar_name); \
MPI_Abort(MPI_COMM_WORLD, MPIT_result); \
} \
return MPIT_result; \
} \
static inline int pvar_prefix ## _finalize(MPI_T_pvar_session session) \
{ \
int MPIT_result; \
/* Stop PVAR */ \
MPIT_result = pvar_prefix ## _stop(session); \
/* Free handle */ \
MPIT_result = MPI_T_pvar_handle_free(session, &(pvar_prefix ## _handle)); \
if( MPI_SUCCESS != MPIT_result ) { \
fprintf(stderr, "Failed to allocate handle on \"%s\" pvar, check that you have " \
"enabled the monitoring component.\n", pvar_prefix ## _pvar_name); \
MPI_Abort(MPI_COMM_WORLD, MPIT_result); \
return MPIT_result; \
} \
return MPIT_result; \
} \
static inline int pvar_prefix ## _read(MPI_T_pvar_session session, void*values) \
{ \
int MPIT_result; \
/* Stop pvar */ \
MPIT_result = pvar_prefix ## _stop(session); \
/* Read values */ \
MPIT_result = MPI_T_pvar_read(session, pvar_prefix ## _handle, values); \
if( MPI_SUCCESS != MPIT_result ) { \
fprintf(stderr, "Failed to read handle on \"%s\" pvar, check that you have " \
"enabled the monitoring component.\n", pvar_prefix ## _pvar_name); \
MPI_Abort(MPI_COMM_WORLD, MPIT_result); \
} \
/* Start and return */ \
return pvar_prefix ## _start(session); \
}
#define GENERATE_CS(prefix, pvar_name_prefix, pvar_class_c, pvar_class_s) \
PVAR_GENERATE_VARIABLES(prefix ## _count, pvar_name_prefix "_count", pvar_class_c) \
PVAR_GENERATE_VARIABLES(prefix ## _size, pvar_name_prefix "_size", pvar_class_s) \
static inline int pvar_ ## prefix ## _init(MPI_T_pvar_session session) \
{ \
prefix ## _count_init(session); \
return prefix ## _size_init(session); \
} \
static inline int pvar_ ## prefix ## _finalize(MPI_T_pvar_session session) \
{ \
prefix ## _count_finalize(session); \
return prefix ## _size_finalize(session); \
} \
static inline void pvar_ ## prefix ## _read(MPI_T_pvar_session session, \
size_t*cvalues, size_t*svalues) \
{ \
/* Read count values */ \
prefix ## _count_read(session, cvalues); \
/* Read size values */ \
prefix ## _size_read(session, svalues); \
}
GENERATE_CS(pml, "pml_monitoring_messages", MPI_T_PVAR_CLASS_SIZE, MPI_T_PVAR_CLASS_SIZE)
GENERATE_CS(osc_s, "osc_monitoring_messages_sent", MPI_T_PVAR_CLASS_SIZE, MPI_T_PVAR_CLASS_SIZE)
GENERATE_CS(osc_r, "osc_monitoring_messages_recv", MPI_T_PVAR_CLASS_SIZE, MPI_T_PVAR_CLASS_SIZE)
GENERATE_CS(coll, "coll_monitoring_messages", MPI_T_PVAR_CLASS_SIZE, MPI_T_PVAR_CLASS_SIZE)
GENERATE_CS(o2a, "coll_monitoring_o2a", MPI_T_PVAR_CLASS_COUNTER, MPI_T_PVAR_CLASS_AGGREGATE)
GENERATE_CS(a2o, "coll_monitoring_a2o", MPI_T_PVAR_CLASS_COUNTER, MPI_T_PVAR_CLASS_AGGREGATE)
GENERATE_CS(a2a, "coll_monitoring_a2a", MPI_T_PVAR_CLASS_COUNTER, MPI_T_PVAR_CLASS_AGGREGATE)
static size_t *old_cvalues, *old_svalues;
static inline void pvar_all_init(MPI_T_pvar_session*session, int world_size)
{
int MPIT_result, provided;
MPIT_result = MPI_T_init_thread(MPI_THREAD_SINGLE, &provided);
if (MPIT_result != MPI_SUCCESS) {
fprintf(stderr, "Failed to initialiaze MPI_Tools sub-system.\n");
MPI_Abort(MPI_COMM_WORLD, MPIT_result);
}
MPIT_result = MPI_T_pvar_session_create(session);
if (MPIT_result != MPI_SUCCESS) {
printf("Failed to create a session for PVARs.\n");
MPI_Abort(MPI_COMM_WORLD, MPIT_result);
}
old_cvalues = malloc(2 * world_size * sizeof(size_t));
old_svalues = old_cvalues + world_size;
pvar_pml_init(*session);
pvar_osc_s_init(*session);
pvar_osc_r_init(*session);
pvar_coll_init(*session);
pvar_o2a_init(*session);
pvar_a2o_init(*session);
pvar_a2a_init(*session);
}
static inline void pvar_all_finalize(MPI_T_pvar_session*session)
{
int MPIT_result;
pvar_pml_finalize(*session);
pvar_osc_s_finalize(*session);
pvar_osc_r_finalize(*session);
pvar_coll_finalize(*session);
pvar_o2a_finalize(*session);
pvar_a2o_finalize(*session);
pvar_a2a_finalize(*session);
free(old_cvalues);
MPIT_result = MPI_T_pvar_session_free(session);
if (MPIT_result != MPI_SUCCESS) {
printf("Failed to close a session for PVARs.\n");
MPI_Abort(MPI_COMM_WORLD, MPIT_result);
}
(void)MPI_T_finalize();
}
static inline int pvar_pml_check(MPI_T_pvar_session session, int world_size, int world_rank)
{
int i, ret = MPI_SUCCESS;
size_t *cvalues, *svalues;
cvalues = malloc(2 * world_size * sizeof(size_t));
svalues = cvalues + world_size;
/* Get values */
pvar_pml_read(session, cvalues, svalues);
for( i = 0; i < world_size && MPI_SUCCESS == ret; ++i ) {
/* Check count values */
if( i == world_rank && (cvalues[i] - old_cvalues[i]) != (size_t) 0 ) {
fprintf(stderr, "Error in %s: count_values[%d]=%zu, and should be equal to %zu.\n",
__func__, i, cvalues[i] - old_cvalues[i], (size_t) 0);
ret = -1;
} else if ( i != world_rank && (cvalues[i] - old_cvalues[i]) < (size_t) world_size ) {
fprintf(stderr, "Error in %s: count_values[%d]=%zu, and should be >= %zu.\n",
__func__, i, cvalues[i] - old_cvalues[i], (size_t) world_size);
ret = -1;
}
/* Check size values */
if( i == world_rank && (svalues[i] - old_svalues[i]) != (size_t) 0 ) {
fprintf(stderr, "Error in %s: size_values[%d]=%zu, and should be equal to %zu.\n",
__func__, i, svalues[i] - old_svalues[i], (size_t) 0);
ret = -1;
} else if ( i != world_rank && (svalues[i] - old_svalues[i]) < (size_t) (world_size * 13 * sizeof(char)) ) {
fprintf(stderr, "Error in %s: size_values[%d]=%zu, and should be >= %zu.\n",
__func__, i, svalues[i] - old_svalues[i], (size_t) (world_size * 13 * sizeof(char)));
ret = -1;
}
}
if( MPI_SUCCESS == ret ) {
fprintf(stdout, "Check PML...[ OK ]\n");
} else {
fprintf(stdout, "Check PML...[FAIL]\n");
}
/* Keep old PML values */
memcpy(old_cvalues, cvalues, 2 * world_size * sizeof(size_t));
/* Free arrays */
free(cvalues);
return ret;
}
static inline int pvar_osc_check(MPI_T_pvar_session session, int world_size, int world_rank)
{
int i, ret = MPI_SUCCESS;
size_t *cvalues, *svalues;
cvalues = malloc(2 * world_size * sizeof(size_t));
svalues = cvalues + world_size;
/* Get OSC values */
memset(cvalues, 0, 2 * world_size * sizeof(size_t));
/* Check OSC sent values */
pvar_osc_s_read(session, cvalues, svalues);
for( i = 0; i < world_size && MPI_SUCCESS == ret; ++i ) {
/* Check count values */
if( cvalues[i] < (size_t) world_size ) {
fprintf(stderr, "Error in %s: count_values[%d]=%zu, and should be >= %zu.\n",
__func__, i, cvalues[i], (size_t) world_size);
ret = -1;
}
/* Check size values */
if( svalues[i] < (size_t) (world_size * 13 * sizeof(char)) ) {
fprintf(stderr, "Error in %s: size_values[%d]=%zu, and should be >= %zu.\n",
__func__, i, svalues[i], (size_t) (world_size * 13 * sizeof(char)));
ret = -1;
}
}
/* Check OSC received values */
pvar_osc_r_read(session, cvalues, svalues);
for( i = 0; i < world_size && MPI_SUCCESS == ret; ++i ) {
/* Check count values */
if( cvalues[i] < (size_t) world_size ) {
fprintf(stderr, "Error in %s: count_values[%d]=%zu, and should be >= %zu.\n",
__func__, i, cvalues[i], (size_t) world_size);
ret = -1;
}
/* Check size values */
if( svalues[i] < (size_t) (world_size * 13 * sizeof(char)) ) {
fprintf(stderr, "Error in %s: size_values[%d]=%zu, and should be >= %zu.\n",
__func__, i, svalues[i], (size_t) (world_size * 13 * sizeof(char)));
ret = -1;
}
}
if( MPI_SUCCESS == ret ) {
fprintf(stdout, "Check OSC...[ OK ]\n");
} else {
fprintf(stdout, "Check OSC...[FAIL]\n");
}
/* Keep old PML values */
memcpy(old_cvalues, cvalues, 2 * world_size * sizeof(size_t));
/* Free arrays */
free(cvalues);
return ret;
}
static inline int pvar_coll_check(MPI_T_pvar_session session, int world_size, int world_rank) {
int i, ret = MPI_SUCCESS;
size_t count, size;
size_t *cvalues, *svalues;
cvalues = malloc(2 * world_size * sizeof(size_t));
svalues = cvalues + world_size;
/* Get COLL values */
pvar_coll_read(session, cvalues, svalues);
for( i = 0; i < world_size && MPI_SUCCESS == ret; ++i ) {
/* Check count values */
if( i == world_rank && cvalues[i] != (size_t) 0 ) {
fprintf(stderr, "Error in %s: count_values[%d]=%zu, and should be equal to %zu.\n",
__func__, i, cvalues[i], (size_t) 0);
ret = -1;
} else if ( i != world_rank && cvalues[i] < (size_t) (world_size + 1) * 4 ) {
fprintf(stderr, "Error in %s: count_values[%d]=%zu, and should be >= %zu.\n",
__func__, i, cvalues[i], (size_t) (world_size + 1) * 4);
ret = -1;
}
/* Check size values */
if( i == world_rank && svalues[i] != (size_t) 0 ) {
fprintf(stderr, "Error in %s: size_values[%d]=%zu, and should be equal to %zu.\n",
__func__, i, svalues[i], (size_t) 0);
ret = -1;
} else if ( i != world_rank && svalues[i] < (size_t) (world_size * (2 * 13 * sizeof(char) + sizeof(int)) + 13 * 3 * sizeof(char) + sizeof(int)) ) {
fprintf(stderr, "Error in %s: size_values[%d]=%zu, and should be >= %zu.\n",
__func__, i, svalues[i], (size_t) (world_size * (2 * 13 * sizeof(char) + sizeof(int)) + 13 * 3 * sizeof(char) + sizeof(int)));
ret = -1;
}
}
/* Check One-to-all COLL values */
pvar_o2a_read(session, &count, &size);
if( count < (size_t) 2 ) {
fprintf(stderr, "Error in %s: count_o2a=%zu, and should be >= %zu.\n",
__func__, count, (size_t) 2);
ret = -1;
}
if( size < (size_t) ((world_size - 1) * 13 * 2 * sizeof(char)) ) {
fprintf(stderr, "Error in %s: size_o2a=%zu, and should be >= %zu.\n",
__func__, size, (size_t) ((world_size - 1) * 13 * 2 * sizeof(char)));
ret = -1;
}
/* Check All-to-one COLL values */
pvar_a2o_read(session, &count, &size);
if( count < (size_t) 2 ) {
fprintf(stderr, "Error in %s: count_a2o=%zu, and should be >= %zu.\n",
__func__, count, (size_t) 2);
ret = -1;
}
if( size < (size_t) ((world_size - 1) * (13 * sizeof(char) + sizeof(int))) ) {
fprintf(stderr, "Error in %s: size_a2o=%zu, and should be >= %zu.\n",
__func__, size,
(size_t) ((world_size - 1) * (13 * sizeof(char) + sizeof(int))));
ret = -1;
}
/* Check All-to-all COLL values */
pvar_a2a_read(session, &count, &size);
if( count < (size_t) (world_size * 4) ) {
fprintf(stderr, "Error in %s: count_a2a=%zu, and should be >= %zu.\n",
__func__, count, (size_t) (world_size * 4));
ret = -1;
}
if( size < (size_t) (world_size * (world_size - 1) * (2 * 13 * sizeof(char) + sizeof(int))) ) {
fprintf(stderr, "Error in %s: size_a2a=%zu, and should be >= %zu.\n",
__func__, size,
(size_t) (world_size * (world_size - 1) * (2 * 13 * sizeof(char) + sizeof(int))));
ret = -1;
}
if( MPI_SUCCESS == ret ) {
fprintf(stdout, "Check COLL...[ OK ]\n");
} else {
fprintf(stdout, "Check COLL...[FAIL]\n");
}
/* Keep old PML values */
pvar_pml_read(session, old_cvalues, old_svalues);
/* Free arrays */
free(cvalues);
return ret;
}
int main(int argc, char* argv[])
{
int size, i, n, to, from, world_rank;
MPI_T_pvar_session session;
MPI_Status status;
char s1[20], s2[20];
strncpy(s1, "hello world!", 13);
MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
pvar_all_init(&session, size);
/* first phase: exchange size times data with everyone in
MPI_COMM_WORLD with collective operations. This phase comes
first in order to ease the prediction of messages exchanged of
each kind.
*/
char*coll_buff = malloc(2 * size * 13 * sizeof(char));
char*coll_recv_buff = coll_buff + size * 13;
int sum_ranks;
for( n = 0; n < size; ++n ) {
/* Allgather */
memset(coll_buff, 0, size * 13 * sizeof(char));
MPI_Allgather(s1, 13, MPI_CHAR, coll_buff, 13, MPI_CHAR, MPI_COMM_WORLD);
for( i = 0; i < size; ++i ) {
if( strncmp(s1, &coll_buff[i * 13], 13) ) {
fprintf(stderr, "Error in Allgather check: received \"%s\" instead of "
"\"hello world!\" from %d.\n", &coll_buff[i * 13], i);
MPI_Abort(MPI_COMM_WORLD, -1);
}
}
/* Scatter */
MPI_Scatter(coll_buff, 13, MPI_CHAR, s2, 13, MPI_CHAR, n, MPI_COMM_WORLD);
if( strncmp(s1, s2, 13) ) {
fprintf(stderr, "Error in Scatter check: received \"%s\" instead of "
"\"hello world!\" from %d.\n", s2, n);
MPI_Abort(MPI_COMM_WORLD, -1);
}
/* Allreduce */
MPI_Allreduce(&world_rank, &sum_ranks, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
if( sum_ranks != ((size - 1) * size / 2) ) {
fprintf(stderr, "Error in Allreduce check: sum_ranks=%d instead of %d.\n",
sum_ranks, (size - 1) * size / 2);
MPI_Abort(MPI_COMM_WORLD, -1);
}
/* Alltoall */
memset(coll_recv_buff, 0, size * 13 * sizeof(char));
MPI_Alltoall(coll_buff, 13, MPI_CHAR, coll_recv_buff, 13, MPI_CHAR, MPI_COMM_WORLD);
for( i = 0; i < size; ++i ) {
if( strncmp(s1, &coll_recv_buff[i * 13], 13) ) {
fprintf(stderr, "Error in Alltoall check: received \"%s\" instead of "
"\"hello world!\" from %d.\n", &coll_recv_buff[i * 13], i);
MPI_Abort(MPI_COMM_WORLD, -1);
}
}
/* Bcast */
if( n == world_rank ) {
MPI_Bcast(s1, 13, MPI_CHAR, n, MPI_COMM_WORLD);
} else {
MPI_Bcast(s2, 13, MPI_CHAR, n, MPI_COMM_WORLD);
if( strncmp(s1, s2, 13) ) {
fprintf(stderr, "Error in Bcast check: received \"%s\" instead of "
"\"hello world!\" from %d.\n", s2, n);
MPI_Abort(MPI_COMM_WORLD, -1);
}
}
/* Barrier */
MPI_Barrier(MPI_COMM_WORLD);
/* Gather */
memset(coll_buff, 0, size * 13 * sizeof(char));
MPI_Gather(s1, 13, MPI_CHAR, coll_buff, 13, MPI_CHAR, n, MPI_COMM_WORLD);
if( n == world_rank ) {
for( i = 0; i < size; ++i ) {
if( strncmp(s1, &coll_buff[i * 13], 13) ) {
fprintf(stderr, "Error in Gather check: received \"%s\" instead of "
"\"hello world!\" from %d.\n", &coll_buff[i * 13], i);
MPI_Abort(MPI_COMM_WORLD, -1);
}
}
}
/* Reduce */
MPI_Reduce(&world_rank, &sum_ranks, 1, MPI_INT, MPI_SUM, n, MPI_COMM_WORLD);
if( n == world_rank ) {
if( sum_ranks != ((size - 1) * size / 2) ) {
fprintf(stderr, "Error in Reduce check: sum_ranks=%d instead of %d.\n",
sum_ranks, (size - 1) * size / 2);
MPI_Abort(MPI_COMM_WORLD, -1);
}
}
}
free(coll_buff);
if( -1 == pvar_coll_check(session, size, world_rank) ) MPI_Abort(MPI_COMM_WORLD, -1);
/* second phase: exchange size times data with everyone except self
in MPI_COMM_WORLD with Send/Recv */
for( n = 0; n < size; ++n ) {
for( i = 0; i < size - 1; ++i ) {
to = (world_rank+1+i)%size;
from = (world_rank+size-1-i)%size;
if(world_rank < to){
MPI_Send(s1, 13, MPI_CHAR, to, world_rank, MPI_COMM_WORLD);
MPI_Recv(s2, 13, MPI_CHAR, from, from, MPI_COMM_WORLD, &status);
} else {
MPI_Recv(s2, 13, MPI_CHAR, from, from, MPI_COMM_WORLD, &status);
MPI_Send(s1, 13, MPI_CHAR, to, world_rank, MPI_COMM_WORLD);
}
if( strncmp(s2, "hello world!", 13) ) {
fprintf(stderr, "Error in PML check: s2=\"%s\" instead of \"hello world!\".\n",
s2);
MPI_Abort(MPI_COMM_WORLD, -1);
}
}
}
if( -1 == pvar_pml_check(session, size, world_rank) ) MPI_Abort(MPI_COMM_WORLD, -1);
/* third phase: exchange size times data with everyone, including self, in
MPI_COMM_WORLD with RMA opertations */
char win_buff[20];
MPI_Win win;
MPI_Win_create(win_buff, 20, sizeof(char), MPI_INFO_NULL, MPI_COMM_WORLD, &win);
for( n = 0; n < size; ++n ) {
for( i = 0; i < size; ++i ) {
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, i, 0, win);
MPI_Put(s1, 13, MPI_CHAR, i, 0, 13, MPI_CHAR, win);
MPI_Win_unlock(i, win);
}
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, world_rank, 0, win);
if( strncmp(win_buff, "hello world!", 13) ) {
fprintf(stderr, "Error in OSC check: win_buff=\"%s\" instead of \"hello world!\".\n",
win_buff);
MPI_Abort(MPI_COMM_WORLD, -1);
}
MPI_Win_unlock(world_rank, win);
for( i = 0; i < size; ++i ) {
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, i, 0, win);
MPI_Get(s2, 13, MPI_CHAR, i, 0, 13, MPI_CHAR, win);
MPI_Win_unlock(i, win);
if( strncmp(s2, "hello world!", 13) ) {
fprintf(stderr, "Error in OSC check: s2=\"%s\" instead of \"hello world!\".\n",
s2);
MPI_Abort(MPI_COMM_WORLD, -1);
}
}
}
MPI_Win_free(&win);
if( -1 == pvar_osc_check(session, size, world_rank) ) MPI_Abort(MPI_COMM_WORLD, -1);
pvar_all_finalize(&session);
MPI_Finalize();
return EXIT_SUCCESS;
}