This fixes https://github.com/open-mpi/ompi/issues/1732: i.e., the
case where the outer project has its own check for
<valgrind/valgrind.h>, but also supplements CPPFLAGS (to find
Valgrind's header files) before doing that check.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Ideally, we would tell OMPI to disable autoconf's caching of our
valgrind check result so that its check gets the right result after
adding CPPFLAGS. Not sure if we can do that.
For now, just disable our Valgrind code in embedded mode.
This will keep the x86 backend enabled under Valgrind but
it will auto-disable itself when finding identical APIC ids anyway
(because CPUID returns same outputs for all PUs).
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Fixesopen-mpi/ompi#1732
(cherry picked from commit open-mpi/hwloc@8b44fb1c81)
Note that this cannot be used for MPI performance testing. It is really only useful for ORTE scaling tests. It also only works with the rsh/ssh launcher.
If requested, obtain stacktraces for each application process and report it to stderr upon timeout
stack traces: minor improvements
- Also include the hostname and PID of the each process for which
we're sending the stack traces (vs. just including the ORTE process
name)
- Send a specific error message if we couldn't find "gstack" in the
$PATH (e.g., on OS X)
- Send a sepcific error message if gstack fails to run
- Print a message that obtaining the stack traces may take a few
seconds so that users don't wonder what's happening
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
help-orterun.txt: minor tweaks
Trivial update: show "--timeout" (instead of "-timeout") in the help
message, just to encourage the use of double-dash options.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
trivial: stacktrace -> stack trace
Trivial word smything.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This commit fixes two bugs in MPI_Wait_any:
- If all requests are inactive then the sync wait would hang forever
because no requests are attached to the sync.
- The request pointer was pointing to the request before the completed
request which caused the wrong request to be freed or marked inactive.
MPI_Wait_some had a similar issue if all the requests were pending.
These issues were identified by MTT.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Before dynamic add_procs support was committed to master we called
add_procs with every proc in the job. The XRC code in the openib btl
was taking advantage of this and setting the number of work queue
entries (WQE) based on all the procs on a remote node. Since that is
no longer the case we can not simply increment the sd_wqe field on the
queue pair. To fix the issue a new field has been added to the xrc
queue pair structure to keep track of how many wqes there are total on
the queue pair. If a new endpoint is added that increases the number
of wqes and the xrc queue pair is already connected the code will
attempt to modify the number of wqes on the queue pair. A failure is
ignored because all that will happen is the number of active send work
requests on an XRC queue pair will be more limited.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The original VADER_MAX_ADDRESS was tunned for x86_64 platforms only.
For non x86_64 platforms we can use XPMEM_MAXADDR_SIZE.
Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>
Regarding BFO it should be mentionned that this component is currently
unmaintained, and that despite my efforts I could not make it compile
(it would not compile before this patch either).
This fixes a hang caused by the request refactor work. The cm pml was
not updated and was hanging is most cases.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit reduces the overhead of calling the ugni progress
function. It does the following:
- Check for new connections once every eight calls.
- Do not call remote smsg progress unless we are connected to at
least one remote peer.
- Do not call rdma progress unless at least one rdma fragment is
outstanding.
- Check endpoint wait list size before obtaining a lock.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit reduces the default exclusivity so that btl/scif is not
used for send/recv over other shared memory transports.
Fixesopen-mpi/ompi#1712
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The request.h header is unfortunately included files in the C++
bindings. C++ does not allow assigning from void * to another
pointer without a cast. This commit adds the cast. We can clean this
up when the C++ bindings are deleted.
Fixesopen-mpi/ompi#1707
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>