2. Remove some unnecessary code that was causing a SEGV.
There may be some more work to be done, but at least orte-clean is functional again.
This commit was SVN r16111.
It is masking a bug that I'm tracking down in the SNAPC FULL - FILEM interations
Also make sure to cleanout the filem structure before asking for another
checkpoint file when not storing the files in place.
This commit was SVN r16109.
A subset of this patch needs to be applied to v1.2
Refs trac:928
This commit was SVN r15918.
The following Trac tickets were found above:
Ticket 928 --> https://svn.open-mpi.org/trac/ompi/ticket/928
Tie stdin to /dev/null to prevent stdin from being closed and thus making stdin not work in slurm allocations.
This commit was SVN r15892.
The following Trac tickets were found above:
Ticket 1047 --> https://svn.open-mpi.org/trac/ompi/ticket/1047
needs to override the default umask. By default, this is not used
since most environments do what the user would expect without any
help.
* Have TM use the newly added umask hook, so that processes inherit
the user's umask from mpirun rather than the pbs_mom's umask, which
the user has no control over.
This commit was SVN r15858.
Application Level Placement Scheduler (ALPS).
This commit was tested under two Cray machines at ORNL: Jaguar (Catamount)
and Rizzo (CNL Test cage). Both machines performed as they should across
the commit.
It is likely that mor changes will follow this the work and environment
stabilizes.
Most of the infrastructure works the same for Catamount and CNL
except for a few bits. Below are the highlights:
Default IFACE Change:
On Catamount we can use PTL_IFACE_DEFAULT, but on the CNL system we have access
to will fail on this interface, and should be set to:
IFACE_FROM_BRIDGE_AND_NALID(PTL_BRIDGE_UK,PTL_IFACE_SS).
So if we detect that we are running with YOD then use the former interface
and if we detect that we are running with ALPS then use the latter.
We will want to pursue a more elegant solution if this interface continues to
change across machines.
PtlGetId and cnos_register_ptlid:
The header suggests that these should never be called when launching with YOD.
But in the ALPS environment the cnos_barrier() will hang forever if these
functions are not called after PtlNIInit(). Since these functions only need to
be called once, and the orte rmgr/cnos component is loaded before the ompi
common/portals componet then just call these functions once in the rmgr/cnos
component.
cnos_barrier_init():
This is a noop for YOD, but critical for ALPS. So be sure to call it before
calling the first barrier in the rmgr/cnos component.
cnos_barrier vs cnos_pm_barrier:
It is suggested the cnos_pm_barrier only be used during finalization
as it will indicate to the launcher (yod or aprun) that the app is about
to complete. It was suggested that we use the regular cnos_barrier() instead.
I want to look into this a bit more to make sure there are not adverse
side effects. A note has been placed in the code to indicate this reasoning.
This commit was SVN r15756.
from orte_ns.compare_fields(), not 0 (yes, they're the same [today],
but it is much better to check for symbolic names...).
This commit was SVN r15731.
to light: we weren't ack'ing properly for streams that originated (or
originated via proxy) and terminated within the HNP. This commit
fixes that.
It also fixes a few style issues, and added some more opal_outputs for
debugging. Also, fixed a bug where the fact that we forwarded (and
therefore might need to update the ack) was not correctly reported if
there were multiple forwards (which there are not as the system is
currently using IOF, but there could be).
Refs trac:1098 -- want to get another pair of eyes to look at this before
I close the ticket.
This commit was SVN r15730.
The following Trac tickets were found above:
Ticket 1098 --> https://svn.open-mpi.org/trac/ompi/ticket/1098
int to void. This function call exit at the end, so there is no way to
return from there. Apply the same thing to the errmsg_abort function and
update all components.
This commit was SVN r15704.
- If one wants to use this solution, remember to unload the project 'orte-restart' which is currently not working for Windows.
This commit was SVN r15680.
grpcomm cnos component
- Remove the .ompi_ignore
- add a configure.m4 that should keep it from building on any system
other than Cray XT* (copied from rml/cnos)
- Fix some mis-named symbols resulting from cut/paste errors.
This patch brings the Cray build back into 'working' order.
This commit was SVN r15651.
or two.
* checking lsb_init() is not sufficient to know whether you're in an
LSF job or not; you also need to check for environment variable
markers
* remove lots of debugging output
* no need for the sds lsf to call lsb_init()
* remove some slurm-like dead code and a copy-n-paste error in the
sds lsf
This commit was SVN r15644.
is no need for the IP address in most cases (filem being one dubious
exception), so just publish and hand around the supposedly opaque contact
info strings
This commit was SVN r15638.