Also create mca parameter to force daemonization (previous
behavior) which might be needed on larger clusters or
to make use of the -notify flag with qsub.
This fixes trac:1783.
This commit was SVN r20582.
The following Trac tickets were found above:
Ticket 1783 --> https://svn.open-mpi.org/trac/ompi/ticket/1783
Often, orte/util/show_help.h is included, although no functionality
is required -- instead, most often opal_output.h, or
orte/mca/rml/rml_types.h
Please see orte_show_help_replacement.sh commited next.
- Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
actually showed two *missing* #include "orte/util/show_help.h"
in orte/mca/odls/base/odls_base_default_fns.c and
in orte/tools/orte-top/orte-top.c
Manually added these.
Let's have MTT the last word.
This commit was SVN r20557.
y combination of comma-separated values and ranges. Daemons will use the first port in the range, MPI procs will use the other ports in the range assuming that they know their node rank in time and enough ports were specified.
NOTE: this capability only works under specific conditions. I will outline more about this in a note to devel as the remainder of the implementation progresses. For now, the only environment where this works is slurm. The linear routed module has also been adjusted to work with static ports so that all messaging flows strictly through the topology, including the initial daemon callback - thus limiting the number of sockets opened by mpirun.
This commit was SVN r20390.
changes to the already existing ccp components
event/win32.c: merge old FD handling into new
opal_installdirs_windows.c:fix the registry handling
This commit was SVN r20109.
Basically, the remaining problem turned out to be:
1. closing stdout/stderr during orte_finalize of mpirun
2. inadvertently setting up a write event on fd = -1
3. devising a scheme to more accurately track when the stdin write event was active vs closed so it only got released once
This passed prelim MTT testing by Jeff and Tim, but should soak for awhile before migrating to 1.3.
This commit was SVN r20106.
The following SVN revision numbers were found above:
r20064 --> open-mpi/ompi@a07660aea8
r20068 --> open-mpi/ompi@ec930d14a9
r20074 --> open-mpi/ompi@2940309613
1. coordination of job completion notification to include a requirement for both waitpid detection AND notification that all iof pipes have been closed by the app
2. change of all IOF read and write events to be non-persistent so they can properly be shutdown and restarted only when required
3. addition of a delay (currently set to 10ms) before restarting the stdin read event. This was required to ensure that the stdout, stderr, and stddiag read events had an opportunity to be serviced in scenarios where large files are attached to stdin.
This commit was SVN r20064.
The slurm plm fork/exec's a call to srun to launch its daemons. When mpirun terminates, it then sends out a "terminate" command to those daemons. The daemons respond back to mpirun, and then exit.
If slurm itself is running on a slow network, and mpirun is running the OOB across a fast network, then it is possible for mpirun to receive notification of daemon termination and exit -before- the srun can complete its bookkeeping and declare the job as complete. When this happens, slurm becomes confused and loses state.
Mucho bad. :-/
This commit changes the termination logic so that mpirun will wait for srun to report complete before exiting. It also enables fully routed communications since it no longer requires daemons to report back that they are terminating, thus allowing the daemons to terminate asynchronously (thereby breaking routing paths).
This commit was SVN r20018.
1. remove direct routed module (hooray!)
2. add radix tree routed module (binomial remains default)
3. remove duplicate data storage - orteds were storing nidmap and pidmap data in odls, everyone else in ess
4. add ess APIs to update nidmap, add new pidmap - used only by orteds for MPI-2 support
5. modify code to eliminate multiple calls to orte_routed.update_route that recreated info already in ess pidmap. Add ess API to lookup that info instead. Modify routed modules to utilize that capability
6. setup new ability to shutdown orteds without sending back an "ack" message to mpirun - not utilized yet, will require some changes to plm terminate_orteds functions in managed environments (coming soon)
Initial tests indicating that fully routing comm via defined routing trees may not actually have a significant cost for operations like IB QP setup. More tests required to confirm.
This will require an autogen...
This commit was SVN r19866.
Modify read_write test to allow specification of rank to read stdin.
IOF now validated to work for arbitrary rank as stdin target. Not validate to work for multiple simultaneous ranks reading stdin (untested).
This commit was SVN r19804.