Application Level Placement Scheduler (ALPS).
This commit was tested under two Cray machines at ORNL: Jaguar (Catamount)
and Rizzo (CNL Test cage). Both machines performed as they should across
the commit.
It is likely that mor changes will follow this the work and environment
stabilizes.
Most of the infrastructure works the same for Catamount and CNL
except for a few bits. Below are the highlights:
Default IFACE Change:
On Catamount we can use PTL_IFACE_DEFAULT, but on the CNL system we have access
to will fail on this interface, and should be set to:
IFACE_FROM_BRIDGE_AND_NALID(PTL_BRIDGE_UK,PTL_IFACE_SS).
So if we detect that we are running with YOD then use the former interface
and if we detect that we are running with ALPS then use the latter.
We will want to pursue a more elegant solution if this interface continues to
change across machines.
PtlGetId and cnos_register_ptlid:
The header suggests that these should never be called when launching with YOD.
But in the ALPS environment the cnos_barrier() will hang forever if these
functions are not called after PtlNIInit(). Since these functions only need to
be called once, and the orte rmgr/cnos component is loaded before the ompi
common/portals componet then just call these functions once in the rmgr/cnos
component.
cnos_barrier_init():
This is a noop for YOD, but critical for ALPS. So be sure to call it before
calling the first barrier in the rmgr/cnos component.
cnos_barrier vs cnos_pm_barrier:
It is suggested the cnos_pm_barrier only be used during finalization
as it will indicate to the launcher (yod or aprun) that the app is about
to complete. It was suggested that we use the regular cnos_barrier() instead.
I want to look into this a bit more to make sure there are not adverse
side effects. A note has been placed in the code to indicate this reasoning.
This commit was SVN r15756.