So we need all the routing code for dealing with cross-job communications, lifelines, etc. The HNP will be directly connected to all daemons as they must callback at startup, and so we need to track those children correctly so we know when it is okay to terminate.
We still have to support direct launch, though, as this is the only component we can use in that scenario. So if the app doesn't have daemon URI info, then it must fall back to directly connecting to everything.
Now that "make check" siphons off stdout/stderr to logfiles, it's ok
to have output by default from tests. This test fails often enough
that it's useful to see the diagnostic output.
Some versions of gcc require this flag to be set before the __sync
builtin atomic compare and swap will support 128-bit values. If the
flag is required this check adds the flag to the CFLAGS.
There currently is no standard support for 128-bit integer types. Any use
of the __int128 and int128_t types can lead to warnings from the compiler
when using -Wpedantic. Additionally, some compilers may support __int128
and other may support int128_t. This commit addresses both issues by
defining opal_int128_t if there is a supported 128-bit type. In the
case of GCC a pragma has been added to suppress warnings about __int128
not being a standard C type.
A 128-bit compare-and-swap will enable a better atomic lifo implementation
that uses the pointer + counter method to avoid ABA issues. This commit
adds configury to check for the instruction (cmpxchg16b) and adds an
implementation that uses the __int128 type available in C99.
If OPAL_MODEX_RECV() returns OPAL_ERR_NOT_FOUND, the peer didn't
send any Portals4 BTL info. This is not a fatal error. Instead of
disqualifying the Portals4 BTL just ignore that peer.
@jsquyres reported this in #194.
Clean up the orte_check_alps.m4. There was a little of
unnecesary stuff for handling cle 5, since it wasn't actually
doing the right thing, which would be to use pkg-config to
find dependencies both for dynamic and static linking.
Decouple the searching for alps libs, etc. from cray pmi.
Switch the alps ess and alps odls components' config files
to use the ALPS m4 macro.
alps configury fixes
Improve a check for detecting CLE release.
Improve an error message.
Add call to orte_odls_alps_get_rdma_creds in the
local proc launch step to obtain the Cray Rdma
credentials from the apshepherd, and to set
the PMI env. variables expected by uGNI BTL, etc.
Add an alps common lib to orte. Add a function
to determine whether or not a process is in a
PAGG container.
Note: we need a better naming convention for
common libs, since right now they use a "flat"
naming convention.
Note this alps ess component has nothing to do
with the old CNOS alps component used on
Cray Seastar/Portals3 (Cray XT) systems.
To work properly, changes need to be made to the
open method of the ess/pmi component to keep it
from selecting, and thus initializing, the opal/pmix/cray
component.