Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be
invoked during an attribute destruction callback (e.g., during the
destruction of keyvals on MPI_COMM_SELF during the very beginning of
MPI_FINALIZE). In such cases, MPI_FINALIZED must return "false".
Prior to this commit, we hung in FINALIZED if it were invoked during
a COMM_SELF attribute destruction callback in FINALIZE. See
https://github.com/open-mpi/ompi/issues/5084.
This commit converts the MPI_INITIALIZED / MPI_FINALIZED
infrastructure to use a single enum (ompi_mpi_state, set atomically)
to represent the state of MPI:
- not initialized
- init started
- init completed
- finalize started
- finalize past COMM_SELF destruction
- finalize completed
The "finalize past COMM_SELF destruction" state is what allows us to
return "false" from MPI_FINALIZED before COMM_SELF has been fully
destroyed / all attribute callbacks have been invoked.
Since this state is checked at nearly every MPI API call (to see if
we're outside of the INIT/FINALIZE epoch), care was taken to use
atomics to *set* the ompi_mpi_state value in ompi_mpi_init() and
ompi_mpi_finalize(), but performance-critical code paths can simply
read the variable without needing to use a slow call to an
opal_atomic_*() function.
Thanks to @AndrewGaspar for reporting the issue.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
- added opal_progress call to wait function to avoid
possible [dead]lock issues
- wait call is declared as inline
- minor fixes
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
+ Add quiet method to SPML, so it can have different implementation with
fence.
+ Use ucp_worker_fence for spml_fence method of UCX SPML
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
This patch disables the oshmem layer if there are no SPMLs that
will build. With the limited set of SPMLs available to support
oshmem, many builds end up installing an oshmem library that we
know will not work. There has been a bit of customer confusion
over oshmem, hopefully this will lead customers in the right
direction.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
- Use opal hash table instead of list for group lookup.
- Code cleanup/refactoring. Group cache is now a part
of the proc_group.
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
This commit renames the arithmetic atomic operations in opal to
indicate that they return the new value not the old value. This naming
differentiates these routines from new functions that return the old
value.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit eliminates the old opal_atomic_bool_cmpset functions. They
have been replaced by the opal_atomic_compare_exchange_strong
functions.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit renames the atomic compare-and-swap functions to indicate
the return value. This is in preperation for adding support for a
compare-and-swap that returns the old value. At the same time the
return type has been changed to bool.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
+ The sshmem verbs component will disqualify itself if this verb isn't
present on the build host.
+ In case where support was requested but not found, don't stop the
build - continue without this component.
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
Convert to F77 notation and split into two (shorter) lines.
Also, make usage of the SHMEM_MAX_NAME_LEN definition, by moving
that first.
Signed-off-by: Rainer Keller <rainer.keller@hft-stuttgart.de>
before this fix, mca_spml_ucx_component_open was using
oshmem_num_procs() to set the value of params.estimated_num_eps for UCX.
The oshmem_num_procs() function uses oshmem_group_all which will be
initialized after the call to mca_spml_ucx_component_open and therefore,
cannot be used there.
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>