* `--mca ompi_display_comm VALUE` where `VALUE` is one or more of:
- `mpi_init` : Display during `MPI_Init`
- `mpi_finalize` : Display during `MPI_Finalize`
* hook/comm_method: Use enum flags to select protocols
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
Remove some left-over infrastructure for handling callbacks into the
MPI C++ bindings (which were removed long ago -- this code is now
stale).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
1. Allow fallback to a lesser AVX support during make
Due to the fact that some distro restrict the compiule architecture
during make (while not setting any restrictions during configure) we
need to detect the target architecture also during make in order to
restrict the code we generate.
2. Add comments and better protect the arch specific code.
Identify all the vectorial functions used and clasify them according to
the neccesary hardware capabilities.
Use these requirements to protect the code for load and stores (the rest
of the code being automatically generated it is more difficult to
protect).
3. Correctly check for AVX* support.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
1. Consistent march flag order between configure and make.
2. op/avx: give the option to skip some tests
it is possible to skip some intrinsic tests by setting some environment variables to "no" before invoking configure:
- ompi_cv_op_avx_check_avx512
- ompi_cv_op_avx_check_avx2
- ompi_cv_op_avx_check_avx
- ompi_cv_op_avx_check_sse41
- ompi_cv_op_avx_check_sse3
3. op/avx: update AVX512 flags
try
-mavx512f -mavx512bw -mavx512vl -mavx512dq
instead of
-march=skylake-avx512
since the former is less likely to conflict with user provided CFLAGS
(e.g. -march=...)
Thanks Bart Oldeman for pointing this.
4. op/avx: have the op/avx library depend on libmpi.so
Refs. open-mpi/ompi#8323
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* PGI was throwing the following error.
```
NVC++-S-0103-Illegal operand types for comparison operator (osc_rdma_frag.h: 75)
NVC++/power Linux 20.11-0: compilation completed with severe errors
```
* It must not have liked the inline declaration of the NULL pointer.
- So replace with a variable, as we do in other places in the code base.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
remove now unused mca parameter, get rid of an unnecesary if-else part,
and move setting the flag outside of the while loop.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
its however restricted to collective I/O operations, at this point
only from vulcan and dynamic_gen2. required some more infrastructure
to be added to recognize individual I/O and multi-threaded environments.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
the lack of performing data sieving has been identified as a main reason for the poor performance in some instances on the Lustre file system. This commit introduces the fundamental ability to perform data sieving for read operations (which should not be controversial). The code itself is correct, what is still lacking is a) the logic when and how to activate data sieving and b) the logic to limit the size of the temporary buffer when doing data sieving.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
the dynamic_gen_file_write_all component distinguishes between the amount of data communicated
to aggregators, and the amount of data written in a cycle by the aggregator (in contrary e.g. to the vulcan component).
There was a bug in calculating which chunks have to be written in a cycle by an aggregator: we added as many elements into the
io_array until we filled one stripe. Unfortuantely, the metric used was the amount of data instead of ensuring that all offsets
fall within a single stripe. This commit fixes this issue. Note, the bug did not create a correctness problem, just a performance
problem in case there were gaps in the file view.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
This has shown to be more effective in achieving overlap
of inter- and intra-node communication and reduces the inital
delay before hitting the network.
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Also make coll/tuned the default for shared memory communication
as coll/sm has shown performance issues that need investigation.
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
The selectable list is sorted with lowest to highest priority so the
user-defined preferences should be appended to the list.
The preference treatment should also maintain the order provided by the user
(first item has highest priority) so switch the loop order.
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
- Add some missing AC_CHECK_SIZEOF's in configure.ac
- Remove some unused variables
- Initialize some variables
- Fix some parameter types
- Cast where appropriate/safe to fix warnings
- Move ompi/mca/common/monitoring Fortran bindings to a separate .c
file so that they can use different #define's than the C bindings,
and therefore compile properly / without warnings.
- Fix signedness discrepancies
- Who knew? Separated these into multiple #if's, instead:
```
// This is undefined behavior
#define HAVE_FOO defined(FOO)
#define YOW (HAVE_FOO && defined(BAR))
```
- Fix some typos in OMPI_BUILD_HOST logic
- Don't "2>/dev/null" in OMPI_BUILD_HOST logic; it just hides errors
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The total size depends on number of ranks so the usual ranges don't work.
Thus, use the average across all ranks to make a decision.
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
MPI_Ialltoallw() and friends take a const MPI_Datatype types[] argument.
In order to be able to call OBJ_RELEASE(types[0]), we used to simply
drop the const modifier. This change make it right by introducing the
OBJ_RELEASE_NO_NULLIFY(object) macro that no more set object = NULL
if the object is freed.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
- only make MCA parameters available if SPC is enabled
- do not compile SPC code if SPC is disabled
- move includes into ompi_spc.c
- allow counters to be enabled through MPI_T without setting MCA parameter
- inline counter update calls that are likely in the critical path
- fix test to succeed even if encountering invalid pvars
- move timer_[start|stop] to header and move attachment info into ompi_spc_t
There is no need to store the name in the ompi_spc_t struct too, we can use that space
for the attachment info instead to avoid accessing another cache line.
- make timer/watermark flags a property of the spc description
This is meant to making adding counters easier in the future by
centralizing the necessary information. By storing a copy of these flags
in the ompi_spc_t structure (without adding to its size) reduces
cache pollution for timer/watermark events.
- allocate ompi_spc_t objects with cache-alignment
This prevents objects from spanning multiple cache lines and thus
ensures that only one cache line is loaded per update.
- fix handling of timer and timer conversion
- only call opal_timer_base_get_cycles if necesary to reduce overhead
- Remove use of OPAL_UNLIKELY to improve code generated by GCC
It appears that GCC makes less effort in optimizing the unlikely path
and generates bloated code.
- Allocate ompi_spc_events statically to reduce loads in critical path
- duplicate comm_world only when dumping is requested
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
These selections seem harmful in my measurements and don't seem to be
motivated by previous measurement data.
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Ensure we correctly collect and save the cpuset of the process
separately from its locality string. Ensure we use the correct one when
computing things like relative locality between processes.
Signed-off-by: Ralph Castain <rhc@pmix.org>