For the non thread-grouping paths, only the first (0th) OFI context
should be used for communication. Otherwise this would access a non existant
array item and cause segfault.
While at it, clarifiy some content regarding SEPs in README (Credit to Matias Cabral
for README edits).
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
Update the OPAL_CHECK_OFI configury macro:
- Make it safe to call the macro multiple times:
- The checks only execute the first time it is invoked
- Subsequent invocations, it just emits a friendly "checking..."
message so that configure output is sensible/logical
- With the goal of ultimately removing opal/mca/common/ofi, rename the
output variables from OPAL_CHECK_OFI to be
opal_ofi_{happy|CPPFLAGS|LDFLAGS|LIBS}.
- Update btl/ofi, btl/usnic, and mtl/ofi for these new conventions.
- Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that
causes the macro to be invoked at a fairly random time, which makes
configure stdout confusing / hard to grok.
- Remove a little left-over kruft in OPAL_CHECK_OFI, too (which
resulted in an indenting change, making the change to
opal_check_ofi.m4 look larger than it really is).
Thanks Alastair McKinstry for the report and initial fix.
Thanks Rashika Kheria for the reminder.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
According to the MPI standard the obj_handle is a pointer to an MPI
object, and therefore cannot be MPI_COMM_WORLD. The MPI standard example
14.6 highlight this usage.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
... and add `MPI_COMPLEX4`.
This commit changes values of existing `OMPI_DATATYPE_MPI_*` macros.
This change does not affect ABI compatibility of `libmpi.so` and the
like because these values are only used in OMPI internal code.
On the other hand, `ompi_datatype_t::id` values of existing datatypes
are not changed and 73 is newly assigned to for `MPI_COMPLEX4` to
retain ABI compatibility.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
... and `ompi_mpi_c_short_float_complex` and `ompi_mpi_cxx_sfltcplex`.
These are Open MPI internal variables intended to be defined as
`MPI_SHORT_FLOAT`, `MPI_C_SHORT_FLOAT_COMPLEX`, and
`MPI_CXX_SHORT_FLOAT_COMPLEX` in the future.
`OMPI_DATATYPE_MPI_C_SHORT_FLOAT_COMPLEX` is also required to
support `MPI_COMPLEX4` in the next commit.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
The type `short float` is proposed for the C language in ISO/IEC JTC
1/SC 22 WG 14 (C WG) for mainly IEEE 754-2008 binary16, a.k.a.
half-precision floating point or FP16.
By this commit, `short float` and `short float _Complex` are detected
in `configure` and used in Open MPI internal code. `MPI_SHORT_FLOAT`
and its complex number version are not added yet.
This commit changes values of existing `OPAL_DATATYPE_*` macros.
This change does not affect ABI compatibility of `libmpi.so` and the
like because these values are only used in OPAL and OMPI internal code.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
treematch/km_partitioning.c #include "config.h",
but there is no such file when the embedded treematch is used.
In order to prevent the embedded treematch from incorrectly using
the config.h from the embedded hwloc, generate a dummy config.h.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
When we exceed the threshold number of contexts created, print appropriate help
text
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
We missed an assert to check if ALLOW_OVERTAKE is set or not before
validating the sequence number and this will cause deadlock.
Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
Provide the av_attr.count hint (number of addresses that will be
inserted into the address vector through the life of the process)
at initialization of the address vector. It's ok to be a bit
wrong, but some endpoints (RxR) can benefit by not going through
the slow growth realloc churn.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
This commit fixes a problem reported on the mailing list with
individual writes larger than 512 MB.
The culprit is a floating point division of two large, close values.
Changing the datatypes from float to double (which is what is being
used in the fcoll components) fixes the problem.
See issue #6285 and
https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118
Thanks for Axel Huebl and René Widera for reporting the issue.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
With MTLs, there's no "other transport" when the remote side
does not have an active NIC, so we should print a useful error
message when the modex failed (indicating lack of a NIC on
the remote side).
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Similar to #6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.
- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)
Signed-off-by: René Widera <r.widera@hzdr.de>
Moving to a model where we have users actively _enable_ SEP feature for use
rather than opening SEP by default if provider supports it. This allows us to
not regress (either functionally or for performance reasons) any apps that were
working correctly on regular endpoints.
Also, providing MCA to specify number of OFI contexts to create and default
this value to 1 (Given btl/ofi also creates one by default, this reduces the
incidence of a scenario where we allocate all available contexts by default and
if btl/ofi asks for one more, then provider breaks as it doesn't support it).
While at it, spruce up README on SEP content.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
This commit fixes a bug when launching with prun where the process
info structures used by the btls are not populated.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
-> Added new targets in Makefile.am to call a new build script
generate-opt-funcs.pl to generate specialized functions for
each *.pm file.
-> Added new perl module *.pm files for send,isend,irecv,iprobe,improbe
which are loaded by generate-opt-funcs.pl to create new source files
that correspond to the name of the .pm file to be used as part of
MTL OFI.
-> Added mtl_ofi_opt.pm.template and updated README with details on the
specialization features and how to add additional specialization
support.
-> Added new opt_common/mtl_ofi_opt_common.pm containing common
functions for generating the specialized functions used by
all other *.pm modules.
-> Added new mtl_ofi.h which includes the definitions for the
function symbol table for storing the specialized functions along
with the definitions for the initialization functions for the
corresponding function pointers.
-> Based off the OFI provider capabilities the specialized function
pointers are assigned at mtl_ofi_component_init to the corresponding
MTL OFI function.
-> mca_mtl_ofi_module_t has been updated with the symbol table
struct which is assigned at component init.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
For cases when the number of local processes is greater than the number of
available contexts, the SEP initialization phase would calculate the number of
contexts to provision for each rank to be 0 and would eventually crash.
Fix the issue here by using regular endpoints in the event the number of local
processes is more than available contexts. This fixes issue #6182.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
Commit 109d0569ff introduced a crash when an error occurred
before ofi_ctxt was allocated, including when no providers
passed the selection logic. Properly check that the pointer
is not NULL in the error cleanup code before dereferencing
the pointer.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
The intention of lowering the priority when all processes are local
was to favor Vader BTL. However, in builds including the OFI MTL it
gets selected instead.
Reviewed-by: Spruit, Neil R <neil.r.spruit@intel.com>
Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com>
Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
OFI MTL supports OFI Scalable Endpoints feature as means to improve
multi-threaded application throughput and message rate. Currently the feature
is designed to utilize multiple TX/RX contexts exposed by the OFI provider in
conjunction with a multi-communicator MPI application model. For more
information, refer to README under mtl/ofi.
Reviewed-by: Matias Cabral <matias.a.cabral@intel.com>
Reviewed-by: Neil Spruit <neil.r.spruit@intel.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
without this fix, an error handler invoked on pml_ucx request would
segfault while trying to dereference requests[i]->req_mpi_object.comm
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
Correctly bubble up errors in NBC collective operations
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
The error field of requests needs to be rearmed at start, not at create
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
So far Vader is faster than OFI MTL for doing shared memory.
Therefore, let it run by default when all procs are local.
Reviewed-by: Spruit, Neil R <neil.r.spruit@intel.com>
Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com>
Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
Now Open MPI requires a C99 compiler. Checking availability of
the following types is no more needed.
- `long long` (`signed` and `unsigned`)
- `long double`
- `float _Complex`
- `double _Complex`
- `long double _Complex`
Furthermore, the `#if HAVE_[TYPE]` style checking is not correct.
Availability of C types is checked by `AC_CHECK_TYPES` in `configure.ac`.
`AC_CHECK_TYPES` defines macro `HAVE_[TYPE]` as `1` in `opal_config.h`
if the `[TYPE]` is available. But it does not define `HAVE_[TYPE]`
(instead of defining as `0`) if it is not available. So even if we
need `HAVE_[TYPE]` checking, it should be `#if defined(HAVE_[TYPE])`.
I didn't remove `AC_CHECK_TYPES` for these types in `configure.ac`
since someone may use `HAVE_[TYPE]` macros somewhere.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have
sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when
this limit is crossed.
Check the max allowed number of ranks during add_procs() and return if there is
danger of exceeding this threshold.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
1. Remove debug output in iallgather (I have forgotten to remove it).
2. Remove an incorrect comment in description of ibcast
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
PR #5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.
MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
Some compilers complain when comparing signed and unsigned. romio321
was doing just this. The check is meant to check whether a size (which
is an ADIO_Offset-- a signed number) will work with memcpy which takes
a size_t. To silence the warning I added a new type (ADIO_Size) which
is an unsigned type and cast the ADIO_Offset to this new type.
Fixes#5951
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The monitoring PML hides it's existence from the OMPI infrastructure by
removing itself from the list of PML loaded components, remaining hidden
until MPI_Finalize.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
With this patch the best PML is selected earlier, before finalizing
the others PML. This provides a simpler mechanism to intercept and
highjack the PML (as done in the monitoring PML)
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
An implementation of R. Rabenseifner's algorithm for MPI_Iallreduce.
This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by an allgather.
Limitations:
-- count >= 2^{\floor{\log_2 p}}
-- commutative operations only
-- intra-communicators only
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
check for providing a data representation that is actually supported
by ompio.
Add also one check for a non-NULL pointer in mpi/c/file_set_view
for the data representation.
Also fixes parts of issue #5643
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
Several fixes to string handling:
1. strncpy() -> opal_string_copy() (because opal_string_copy()
guarantees to NULL-terminate, and strncpy() does not)
2. Simplify a few places, such as:
* Since opal_string_copy() guarantees to NULL terminate, eliminate
some memsets(), etc.
* Use opal_asprintf() to eliminate multi-step string creation
There's more work that could be done; e.g., this commit doesn't
attempt to clean up any strcpy() usage.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
this ensures that all processes are done modifying a file
before syncing. Fixes an error in the testmpio testsuite.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
return MPI_ERR_ARG if the size of the fileview is not a
multiple of the size of the etype provided.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
return MPI_ERR_ACCESS if the user tries to read from a file
that was opened using MPI_MODE_WRONLY
return MPI_ERR_READ_ONLY if the user tries to write a file
that was opened using MPI_MODE_RDONLY
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
Implements recursive doubling algorithm for MPI_Iallgather.
The algorithm can be used only for power-of-two number of processes.
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
Make sure all pending communications are done on all ranks before
closing the window. This way it will be safe to close the endpoints when
closing the component.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>