working with Automake 1.10. This is a new hack, which should be much
more flexible. The ras doesn't contain any Objective C, so remove the
hack entirely from that Makefile.am.
This commit was SVN r16269.
(Someday I'll learn to do this before committing)
This commit was SVN r16260.
The following SVN revision numbers were found above:
r16252 --> open-mpi/ompi@e10f476c87
and implementation. This has shown drastic performance benefit when
transferring Many files at roughly the same time.
I tested this for many different filem operations and everything was working
fine. Let me know if you have any problems with this functionality.
Some Notes:
- opal-checkpoint now has a 'quiet' flag to keep it from being too verbose.
- FileM RSH component is fully non-blocking.
- FileM RSH component has incomming connection throttling since by default
ssh only allows 10 concurrent scp connections to any single host. This
default can be adjusted via an MCA parameter.
{{{-mca filem_rsh_max_incomming 10}}}
- There is an MCA parameter for max outgoing connections, but it is currently
not implemented. If someone needs it then it should not be hard to implement.
{{{-mca filem_rsh_max_outgoing 10}}}
- Changed the FileM request structure so that it is a bit more explicit and
flexible.
- Moved the 'preload-binary' and 'preload-files' functionality into odls/base
allowing for code reuse in the 'process' and 'default' ODLS components.
- Fixed a bug in the process name resolution which broke the 'preload-*'
functionality due to GPR table structure changes.
- The FileM RSH component might be able to see even more speedup from using a
thread pool to operate on the work_pool structures, but that is for future
work.
- Added a 'opal-show-help' file to ODLS Base
This commit was SVN r16252.
performance characterization, and should not be used by anyone doing anything
else since it will not produce a globally consistent checkpoint in this mode.
This commit was SVN r16192.
2. Remove some unnecessary code that was causing a SEGV.
There may be some more work to be done, but at least orte-clean is functional again.
This commit was SVN r16111.
It is masking a bug that I'm tracking down in the SNAPC FULL - FILEM interations
Also make sure to cleanout the filem structure before asking for another
checkpoint file when not storing the files in place.
This commit was SVN r16109.
A subset of this patch needs to be applied to v1.2
Refs trac:928
This commit was SVN r15918.
The following Trac tickets were found above:
Ticket 928 --> https://svn.open-mpi.org/trac/ompi/ticket/928
Tie stdin to /dev/null to prevent stdin from being closed and thus making stdin not work in slurm allocations.
This commit was SVN r15892.
The following Trac tickets were found above:
Ticket 1047 --> https://svn.open-mpi.org/trac/ompi/ticket/1047
needs to override the default umask. By default, this is not used
since most environments do what the user would expect without any
help.
* Have TM use the newly added umask hook, so that processes inherit
the user's umask from mpirun rather than the pbs_mom's umask, which
the user has no control over.
This commit was SVN r15858.
Application Level Placement Scheduler (ALPS).
This commit was tested under two Cray machines at ORNL: Jaguar (Catamount)
and Rizzo (CNL Test cage). Both machines performed as they should across
the commit.
It is likely that mor changes will follow this the work and environment
stabilizes.
Most of the infrastructure works the same for Catamount and CNL
except for a few bits. Below are the highlights:
Default IFACE Change:
On Catamount we can use PTL_IFACE_DEFAULT, but on the CNL system we have access
to will fail on this interface, and should be set to:
IFACE_FROM_BRIDGE_AND_NALID(PTL_BRIDGE_UK,PTL_IFACE_SS).
So if we detect that we are running with YOD then use the former interface
and if we detect that we are running with ALPS then use the latter.
We will want to pursue a more elegant solution if this interface continues to
change across machines.
PtlGetId and cnos_register_ptlid:
The header suggests that these should never be called when launching with YOD.
But in the ALPS environment the cnos_barrier() will hang forever if these
functions are not called after PtlNIInit(). Since these functions only need to
be called once, and the orte rmgr/cnos component is loaded before the ompi
common/portals componet then just call these functions once in the rmgr/cnos
component.
cnos_barrier_init():
This is a noop for YOD, but critical for ALPS. So be sure to call it before
calling the first barrier in the rmgr/cnos component.
cnos_barrier vs cnos_pm_barrier:
It is suggested the cnos_pm_barrier only be used during finalization
as it will indicate to the launcher (yod or aprun) that the app is about
to complete. It was suggested that we use the regular cnos_barrier() instead.
I want to look into this a bit more to make sure there are not adverse
side effects. A note has been placed in the code to indicate this reasoning.
This commit was SVN r15756.
from orte_ns.compare_fields(), not 0 (yes, they're the same [today],
but it is much better to check for symbolic names...).
This commit was SVN r15731.
to light: we weren't ack'ing properly for streams that originated (or
originated via proxy) and terminated within the HNP. This commit
fixes that.
It also fixes a few style issues, and added some more opal_outputs for
debugging. Also, fixed a bug where the fact that we forwarded (and
therefore might need to update the ack) was not correctly reported if
there were multiple forwards (which there are not as the system is
currently using IOF, but there could be).
Refs trac:1098 -- want to get another pair of eyes to look at this before
I close the ticket.
This commit was SVN r15730.
The following Trac tickets were found above:
Ticket 1098 --> https://svn.open-mpi.org/trac/ompi/ticket/1098
int to void. This function call exit at the end, so there is no way to
return from there. Apply the same thing to the errmsg_abort function and
update all components.
This commit was SVN r15704.
- If one wants to use this solution, remember to unload the project 'orte-restart' which is currently not working for Windows.
This commit was SVN r15680.
grpcomm cnos component
- Remove the .ompi_ignore
- add a configure.m4 that should keep it from building on any system
other than Cray XT* (copied from rml/cnos)
- Fix some mis-named symbols resulting from cut/paste errors.
This patch brings the Cray build back into 'working' order.
This commit was SVN r15651.