This commit fixes two bugs in MPI_Wait_any:
- If all requests are inactive then the sync wait would hang forever
because no requests are attached to the sync.
- The request pointer was pointing to the request before the completed
request which caused the wrong request to be freed or marked inactive.
MPI_Wait_some had a similar issue if all the requests were pending.
These issues were identified by MTT.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Regarding BFO it should be mentionned that this component is currently
unmaintained, and that despite my efforts I could not make it compile
(it would not compile before this patch either).
This fixes a hang caused by the request refactor work. The cm pml was
not updated and was hanging is most cases.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The request.h header is unfortunately included files in the C++
bindings. C++ does not allow assigning from void * to another
pointer without a cast. This commit adds the cast. We can clean this
up when the C++ bindings are deleted.
Fixesopen-mpi/ompi#1707
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
* Remodel the request.
Added the wait sync primitive and integrate it into the PML and MTL
infrastructure. The multi-threaded requests are now significantly
less heavy and less noisy (only the threads associated with completed
requests are signaled).
* Fix the condition to release the request.
This commit adds support for the MPI-3.1 accumulate_ordering info
key. The default value is rar,war,raw,waw and is supported using an
MCA variable flag enumerator.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Add checks to bail out if our precomputed value is less
than needed (we are already at fault).
bot:milestone:v1.10.3
bot:milestone:v2.0
bot🏷️bug
bot:assign: @ggouaillardet
This commit changes the behavior of bml/r2 from conditionally
registering btl progress functions to always registering progress
functions. Any progress function beloning to a btl that is not yet in
use is registered as low-priority. As soon as a proc is added that
will make use of the btl is is re-registered normally.
This works around an issue with some btls. In order to progress a
first message from an unknown peer both ugni and openib need to have
their progress functions called. If either btl is not in use after the
first call to add_procs the callback was never happening. This commit
ensures the btl progress function is called at some point but the
number of progress callbacks is reduced from normal to ensure lower
overhead when a btl is not used. The current ratio is 1 low priority
progress callback for every 8 calls to opal_progress().
Fixesopen-mpi/ompi#1676
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
As more providers get added to libfabric, the default exclude list would need
to be updated.
Instead, we choose to include only the providers known to work by default.
New default:
- include: psm,psm2,gni
- exclude: none
lines up to the next ".fi" -- which for functions that do not implement the corresponding interface
as code will have all eliminated.
Change to delete the man page's content up to the next section header ".SH"
Also in case of make V=1, we'd like to see the command line, too.
Amend OMPI_Affinity_str according to the other man-pages definitions.
There were some old/stale function names in some debugging/verbose
opal_output calls. Use __func__ instead, so that they won't become
stale in the future.
Thanks to Durga Choudhury for pointing out the issue.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Update external as well
Revise the change: we still need the MPI_Barrier in MPI_Finalize when we use a blocking fence, but do use the "lazy" wait for completion. Replace the direct logic in MPI_Init with a cleaner macro
Intel TrueScale and Intel OmniPath, and detect a link in ACTIVE state.
This fix addresses the scenario reported in the below OMPI users email,
including formerly named Qlogic IB, now Intel True scale. Given the
nature of the PSM/PSM2 mtls this fix applies to OmniPath:
https://www.open-mpi.org/community/lists/users/2016/04/29018.php
MPIR-1.0 specifies that the following symbols are only relevant in the
starter process:
- MPIR_Breakpoint
- MPIR_being_debugged
- MPIR_debug_state
- MPIR_debug_abort_string
I.e., the code filling in values in these various symbols was useless
/ never used.
MPIR-1.1 will define that MPIR_being_debugged *is* relevant in MPI
processes. That symbol is currently defined in libopen-rte (which is
currently causing a duplicate symbol error for static builds -- this
commit fixes that error), and is therefore still available for MPI
processes.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
MPI-3.1 says that even if no info keys are set on the file, we need to
return a new, empty info.
Thanks to Lisandro Dalcin for identifying the issue.
Fixesopen-mpi/ompi#1630
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>