Ralph identified the problem, I tracked down ''where'' the fd was
being closed, and Brian figured out ''why'' (and the fix).
What was happening is that a remote process was closing its
stdout/stderr and therefore sending a 0-byte IOF message to mpirun.
mpirun, in turn, closed the iof endpoint associated with that stream
(i.e., stdout/stderr). IOF does this to handle the case where
mpirun's stdin is closed -- this therefore causes the stdin on all the
ORTE-started processes to have their stdin's closed as well.
So the workaround here is to check that if we get a 0-byte IOF message
on a sink (indicating a remote closure), and if that sink is the
special stdout or stderr stream, don't actually close anything in the
local process.
This commit was SVN r12691.
The following Trac tickets were found above:
Ticket 635 --> https://svn.open-mpi.org/trac/ompi/ticket/635
so this isn't an issue there either. Refs trac:488
This commit was SVN r12675.
The following Trac tickets were found above:
Ticket 488 --> https://svn.open-mpi.org/trac/ompi/ticket/488
* Fix a counter roll-over issue that could result from a large (but
not excessive) number of outstanding put/get/accumulate calls
during a single synchronization issues (Refs trac:506)
* Fix epoch issue with rdma component that would effect PWSC
synchronization (Refs trac:507)
This commit was SVN r12673.
The following Trac tickets were found above:
Ticket 506 --> https://svn.open-mpi.org/trac/ompi/ticket/506
Ticket 507 --> https://svn.open-mpi.org/trac/ompi/ticket/507
* use one-sided datatype check instead of send/receive and check both
the origin and target datatypes
* allow error handler to be set on MPI_WIN_NULL, per standard
* Allow recursive calls into the pt2pt osc component's progress
function
* Fix an uninitialized variable problem in the unlock header
This commit was SVN r12667.
because they are in ORTE, not OMPI. Also, remove the ORTE_PROCESS_NAME macros
in iof base as they are duplicates of the ones that were in ns_types, which
meant that bad things happened if you changed what an orte_process_name_t
looked like.
This commit was SVN r12646.
the same time, remove some of the MPI-related options from OPAL:
- provide mechanism to change at runtime whether sched_yield() should
be called when the progress engine is idle
- provide mechanism for changing the rate at which the event engine
is called when there are "no" users of the event engine (ie, when
using MPI but not TCP)
- fix some function names in the progress engine to better match
their intended use (and remove MPI naming scheme)
- remove progress_mpi_enable / progress_mpi_disable because
we can now use the functions to set the sched_yield and
tick rate interfaces
- rename opal_progress_events() to opal_progress_set_event_flag()
because the first really isn't descriptive of what the function
does and I always got confused by it
This commit was SVN r12645.
Fix comm_spawn by singletons. orte_init does some voodoo to let the system know about localhost when we are a singleton. This includes allocating it so that any comm_spawn'd children can use their parent "allocation". Unfortunately, the fix that bproc needs (due to that smr filling up the node segment!) causes the singleton startup to fail. The fix is to just have the singleton startup force an allocation of its localhost.
Only issue here is: what happens if we are in a persistent universe? The singleton will now overwrite any prior info on slots used on localhost by other jobs (won't affect anything else). The answer, of course, is to do something more intelligent - lookup localhost on the registry and just update its info instead of overwriting it.
Something for another day (or month....or year)
This commit was SVN r12644.
We were burned again by the fact that the bproc state monitor creates entries on the node segment for *all* the nodes in the cluster when it is opened during orte_init. As a result, the bjs allocator was never being called, and the system merrily assumed that *all* nodes in the cluster had been allocated to it.
To fix this, I removed a test that had been inserted into the allocation procedure that checked for a non-zero node segment. This was an old artifact - the RAS components already know that they are not to overwrite any existing node segment entries (at least, bproc does - I will check the others. For now, I just want to save the bproc fix on this machine).
This commit was SVN r12640.
Ensure that the new predefined MPI-2 attribute callback functions take
the proper types (INTEGER, kind=MPI_ADDRESS_KIND instead of just
INTEGER).
This commit was SVN r12639.
The following Trac tickets were found above:
Ticket 624 --> https://svn.open-mpi.org/trac/ompi/ticket/624
Modify the RMAPS framework so we eliminate communicating a map to a backend node when certain attributes are set. The proxy functions are now implemented in the base, and a check made for HNP/non-HNP operation made in the map_jobs function prior to execution.
This commit was SVN r12619.
Add placeholders for the new orte tools. These don't actually do anything yet - in fact, I have set the .ompi_ignore so that you won't compile them (I have set a .ompi_unignore for me). Please let me know if you encounter any trouble with this - the ompi_ignore's should protect everyone.
This commit was SVN r12616.
Note that Bproc won't support this operation, so we just ignore the --reuse-daemons directive.
I'm afraid I don't understand the POE and XGrid environments well enough to attempt the necessary modifications.
Also, please note that XGrid support has been broken on the trunk. I don't understand the code syntax well enough to make the required changes to that PLS component, so it won't compile at the moment. I'm hoping Brian has a few minutes to fix it after SC.
This commit was SVN r12614.
1. new functionality in the pls base to check for reusable daemons and launch upon them
2. an extension of the odls API to allow each odls component to build a notify message with the "correct" data in it for adding processes to the local daemon. This means that the odls now opens components on the HNP as well as on daemons - but that's the price of allowing so much flexibility. Only the default odls has this functionality enabled - the others just return NOT_IMPLEMENTED
3. addition of a new command line option "--reuse-daemons" to orterun. The default, for now, is to NOT reuse daemons. Once we have more time to test this capability, we may choose to reverse the default. For one thing, we probably want to investigate the tradeoffs in start time for comm_spawn'd processes that reuse daemons versus launch their own. On some systems, though, having another daemon show up can cause problems - so they may want to set the default as "reuse".
This is ONLY enabled for rsh launch, at the moment. The code needing to be added to each launcher is about three lines long, so I'll be doing that as I get access to machines I can test it on.
This commit was SVN r12608.
1. use non-blocking sends to transmit commands (this was actually done in a prior commit)
2. have an "ack" message sent back from the orted when it completes the command
The latter item is the new one here. With my prior commit, it was possible for the HNP to move on to other things before the orted had completed its command. This caused the HNP to occassionally exit before the orted, thus generating "lost connection" errors. With this change, we retain the parallel nature of the command communications, but still hold the HNP at that point until the orteds are done.
Best of both worlds.
This commit was SVN r12605.
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.