Retain the hetero-nodes flag for those cases where the user *knows* that there are differences and our automated system isn't good enough to see it.
Will obviously require further refinement as we find out which variances it can detect, and which it cannot.
So we need all the routing code for dealing with cross-job communications, lifelines, etc. The HNP will be directly connected to all daemons as they must callback at startup, and so we need to track those children correctly so we know when it is okay to terminate.
We still have to support direct launch, though, as this is the only component we can use in that scenario. So if the app doesn't have daemon URI info, then it must fall back to directly connecting to everything.
Clean up the orte_check_alps.m4. There was a little of
unnecesary stuff for handling cle 5, since it wasn't actually
doing the right thing, which would be to use pkg-config to
find dependencies both for dynamic and static linking.
Decouple the searching for alps libs, etc. from cray pmi.
Switch the alps ess and alps odls components' config files
to use the ALPS m4 macro.
alps configury fixes
Improve a check for detecting CLE release.
Improve an error message.
Add call to orte_odls_alps_get_rdma_creds in the
local proc launch step to obtain the Cray Rdma
credentials from the apshepherd, and to set
the PMI env. variables expected by uGNI BTL, etc.
Add an alps common lib to orte. Add a function
to determine whether or not a process is in a
PAGG container.
Note: we need a better naming convention for
common libs, since right now they use a "flat"
naming convention.
Note this alps ess component has nothing to do
with the old CNOS alps component used on
Cray Seastar/Portals3 (Cray XT) systems.
To work properly, changes need to be made to the
open method of the ess/pmi component to keep it
from selecting, and thus initializing, the opal/pmix/cray
component.
Be more selective about closing fd's for the alps odls
component. Don't close fd's of pipes set up by the
apshepherd for providing RDMA credentials, etc.
Add an entry to the help file in case
alps_app_lli_pipes returns an error.
There was an obvious bug in the alps/ras component compare_nodes method
which resulted in the function always evaluating the nodes
as being equivalent.
It turns out that the support for Open MPI apps on
Cray was hanging on a thin thread of support when
using the mpirun job launcher. It just happened that
with a certain set of configuration options things would
work. This is bound to backfire at some point.
To fix this weakness, as well as to allow for mpirun launched
jobs to benefit from many of the advanced placement features
provided by the Cray Linux Environment (as opposed to the hwloc
only default env of orte), a new odls alps component is introduced.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.