This commit updates the intercomm allgather to do a local comm bcast
as the final step. This should resolve a hang seen in intercomm
tests.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
The xlc compiler seems to behave in a different way that gcc when it
comes the inline asm. There were two problems with the code with xlc:
- The TOC read in mca_patcher_base_patch_hook used the syntax
register unsigned long toc asm("r2") to read $r2 (the TOC
pointer). With gcc this seems to behave as expected but with xlc
the result in toc is not the same as $r2. I updated the code to use
asm volatile ("std 2, %0" : "=m" (toc)) to load the TOC pointer.
- The OPAL_PATCHER_BEGIN macro is meant to be the first thing in a
hook. On PPC64 it loads the correct TOC pointer (thanks to
mca_patcher_base_patch_hook) and saves the old one. The
OPAL_PATCHER_END macro restores the TOC pointer. Because we *need*
the TOC to be correct before it is accessed in the hook the
OPAL_PATCHER_BEGIN macro MUST come first. We did this and all was
well with gcc. With xlc on the other hand there was a TOC access
before the assembly inserted by OPAL_PATCHER_BEGIN. To fix this
quickly I broke each hook into a pair of function with the
OPAL_PATCHER_* macros on the top level functions. This works around
the issue but is not a clean way to fix this. In the future we
should 1) either update overwrite to not need this, or 2) figure
out why xlc is not inserting the asm before the first TOC read.
This fixesopen-mpi/ompi#1854
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
protect the remaining functions where necessary by a mutex lock
to avoid problems in multi-threaded executions. Some functions
do not require that in my opinion, and I provided an explanation
in those cases.
This commit fixes an ordering bug in the code that keeps track of all
attached memory windows. The code is intended to keep the memory
regions sorted but was often inserting at the wrong index. Thanks to
Christoph Niethammer for reporting the issue. The reproducer will be
added to nightly MTT testing.
Fixesopen-mpi/ompi#2012
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
With libfabric v1.4, the usnic provider changed the values of its
fabric and domain name strings (compared to libfabric <v1.4). Update
the Open MPI usNIC BTL to handle both pre-v1.4 and v1.4 fabric/domain
names.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Oversubscribe stipulates that we allow more procs on the node than assigned slots - it has nothing to do with the number of available pe's. Let overload directives handle the pe situation.
This commit fixes some bugs uncovered during thread testing of
2.0.1rc1. With these fixes the component is running cleanly with
threads.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
and fail with a user friendly message if no method is available:
"sec: native cannot validate_cred on this system"
(back-ported from upstream pmix/master@c474a1fc60)