This is a fix based on a bugreport on github/mailing list from CGNS.
The core of the problem was that different processes entered different branches of
our aggregator selection logic, due to the fact that in some cases processes had
a matching file_view size and contiguous chunk size (thus assuming 1-D distribution),
and some processes did not (thus assuming 2-D distribution). The fix is to calculate
the avg. file view size across all processes and use this value, thus ensuring that
all processes enter the same branch.
Fixes issue #7809
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
As discussed, a feature is being added to libpsm2 to correctly handle
the case where the library is opened by multiple OMPI transports in the same
process. (For example, the OFI BTL and the PSM2 MTL).
* Improved error message to indicate required libpsm2 version.
* Adds a test at autogen/configure time for the existence of
PSM2_LIB_REFCOUNT_CAP.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
The gather and scatter operations did not use the correct message size
(Only did datatype size * com size). This did not correctly reflect the
total message size and prevents fine tuning within a com size. This
patch multiplies the value by the number of elements sent.
Signed-off-by: William Zhang <wilzhang@amazon.com>
Change ompi_mtl_ofi_get_endpoint() to call the active PML's add_procs()
rather than the OFI MTL add_procs() directly when discovering a new
process during operation.
Functionally, this has no impact in correct operation. However, the
current behavior means that the heterogenous and active PML checks
are not being executed in the dynamic discovery case.
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
also add common verbose variable.
Note the verbosity thing is a little tricky owing to the way the MCA frameworks and components are registered and
and initialized. The BTL's are registered/initialized prior to the MTL components even getting registered.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
- added detection of new API into configuration
- added tag_send call implemented using new API
- added MPI_Send/MPI_Isend/MPI_Recv/MPI_Irecv implementations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
Adds the capability to select a NIC based on hardware locality.
Creates a list of NICs that share the same cpuset as the process,
then selects the NIC based on the (local rank) % (number of NICs).
If no NICs are available that share the same cpuset, the selection process
will create a list of all available NICs and make a selection based on
(local rank) % (number of NICs)
Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
Consolidate the ompi_process_info and opal_process_info structs to
remove duplicate storage and conversion issues. Unwind some interweaving
of include files using opal.h. Silence a couple of warnings.
For now, set the arch to local if PMIX_ARCH is not found.
Signed-off-by: Ralph Castain <rhc@pmix.org>
For direct modex, all procs publish the selected pml module
and then at add_procs pml module for each proc is checked
against every other proc in the add_proc call.
For full modex, there is no change in functionality. Only Rank0
publishes its selected pml, all other procs in the add_proc call
check their selected pml against Rank0.
If pml's do not match, throw error and exit.
Signed-off-by: Dipti Kothari <dkothar@amazon.com>
Pass the correct ompi_proc_t and array length to
mca_pml_base_pml_check_selected() during dynamic modex.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>