This commit fixes a segmentation fault that occurs if a device can be
initialized but not used. In this case the devices_count is not equal
to the number of usable devices in the devices pointer array.
Thanks to @artpol84 for tracking this down.
Fixesopen-mpi/ompi#1823
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The way the gni btl is currently coded,
it will run completely out of gas on KNL at
123 processes/node. Since there are bound to be
those who try to run a MPI process/hyperthread
on KNL nodes, the fma sharing mode needs to be requested.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Add PMIx 2.0
Remove PMIx 1.1.4
Cleanup copying of component
Add missing file
Touchup a typo in the Makefile.am
Update the pmix ext114 component
Minor cleanups and resync to master
Update to latest PMIx 2.x
Update to the PMIx event notification branch latest changes
Before dynamic add_procs the openib_btl_size_queues was called exactly
once for non-dynamic jobs. Now the function is called on each new
connection so the calculation was wrong. Re-wrote the function to
correctly calculate the CQ size and only attempt to adjust the CQ if
the requested size has changed. This fixes a bug when using the openib
btl on psm2 hardware that is caused by the time needed to resize a
CQ. The overhead was causing udcm to timeout and fail.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This fixes https://github.com/open-mpi/ompi/issues/1732: i.e., the
case where the outer project has its own check for
<valgrind/valgrind.h>, but also supplements CPPFLAGS (to find
Valgrind's header files) before doing that check.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Ideally, we would tell OMPI to disable autoconf's caching of our
valgrind check result so that its check gets the right result after
adding CPPFLAGS. Not sure if we can do that.
For now, just disable our Valgrind code in embedded mode.
This will keep the x86 backend enabled under Valgrind but
it will auto-disable itself when finding identical APIC ids anyway
(because CPUID returns same outputs for all PUs).
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Fixesopen-mpi/ompi#1732
(cherry picked from commit open-mpi/hwloc@8b44fb1c81)
This commit fixes a programming error when using an aries nic. The
documentation of ugni shows that only the local alignment restriction
for get was lifted on aries. There is still a remote address alignment
restriction.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a programming error when using an aries nic. The
documentation of ugni shows that only the local alignment restriction
for get was lifted on aries. There is still a remote address alignment
restriction.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds support for Cray Aries atomic operations. This
includes 32-bit and floating point support.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit add support for more atomic operations and type. The
operations added are logical and, logical or, logical xor, swap, min,
and max. New types are 32-bit int by using the
MCA_BTL_ATOMIC_FLAG_32BIT flag, 64-bit float by using the
MCA_BTL_ATOMIC_FLAG_FLOAT flag, and 32-bit float by using both
flags. Floating point numbers are supported by packing the number in
as an int64_t or int32_t. We will update the btl interface in the
future to make this less confusing.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Before dynamic add_procs support was committed to master we called
add_procs with every proc in the job. The XRC code in the openib btl
was taking advantage of this and setting the number of work queue
entries (WQE) based on all the procs on a remote node. Since that is
no longer the case we can not simply increment the sd_wqe field on the
queue pair. To fix the issue a new field has been added to the xrc
queue pair structure to keep track of how many wqes there are total on
the queue pair. If a new endpoint is added that increases the number
of wqes and the xrc queue pair is already connected the code will
attempt to modify the number of wqes on the queue pair. A failure is
ignored because all that will happen is the number of active send work
requests on an XRC queue pair will be more limited.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>