The temporary solution is to switch into EV_NONBLOCK mode earlier (right after the mx_connect loop) so that there isn't a giant slowdown when processes enter the stage gate 2 barrier before other proesses. They will now not block in the event library for any period of time, which appears to have a 50% speedup when running at > 64 procs.
Refs trac:645
This commit was SVN r12713.
The following Trac tickets were found above:
Ticket 645 --> https://svn.open-mpi.org/trac/ompi/ticket/645
OMPI_ARRAY_INT_2_LOGICAL had an array bounds error - fixed this and the
analogous error in OMPI_ARRAY_LOGICAL_2_INT.
This commit was SVN r12712.
The following Trac tickets were found above:
Ticket 482 --> https://svn.open-mpi.org/trac/ompi/ticket/482
case where sizeof(INTEGER) > sizeof(int).
This commit was SVN r12707.
The following SVN revision numbers were found above:
r12684 --> open-mpi/ompi@e2c605f32a
* Check that the C++, Fortran 77, and Objective C comilers emit code
that can link against object files emitted by the C compiler.
Moves some built / run time errors to configure time, which is
nice and should help with the debugging
* Remove unneeded -F option when building the XGrid components,
which started causing problems with LT 2.0.
* Try to use the XGridFoundation library, rather than just seeing
if we can give -framework XGridFoundation. Should make the
test slightly more accurate
* Don't assume XGrid is unavailable on 64 bit platforms, as that
won't be true on Leopard
* Require AM 1.10 or newer if using AC 2.60 or newer, so that
we don't have a split of AC supporting Objective C and AM
not doing so
This commit was SVN r12701.
Ralph identified the problem, I tracked down ''where'' the fd was
being closed, and Brian figured out ''why'' (and the fix).
What was happening is that a remote process was closing its
stdout/stderr and therefore sending a 0-byte IOF message to mpirun.
mpirun, in turn, closed the iof endpoint associated with that stream
(i.e., stdout/stderr). IOF does this to handle the case where
mpirun's stdin is closed -- this therefore causes the stdin on all the
ORTE-started processes to have their stdin's closed as well.
So the workaround here is to check that if we get a 0-byte IOF message
on a sink (indicating a remote closure), and if that sink is the
special stdout or stderr stream, don't actually close anything in the
local process.
This commit was SVN r12691.
The following Trac tickets were found above:
Ticket 635 --> https://svn.open-mpi.org/trac/ompi/ticket/635
so this isn't an issue there either. Refs trac:488
This commit was SVN r12675.
The following Trac tickets were found above:
Ticket 488 --> https://svn.open-mpi.org/trac/ompi/ticket/488
* Fix a counter roll-over issue that could result from a large (but
not excessive) number of outstanding put/get/accumulate calls
during a single synchronization issues (Refs trac:506)
* Fix epoch issue with rdma component that would effect PWSC
synchronization (Refs trac:507)
This commit was SVN r12673.
The following Trac tickets were found above:
Ticket 506 --> https://svn.open-mpi.org/trac/ompi/ticket/506
Ticket 507 --> https://svn.open-mpi.org/trac/ompi/ticket/507
* use one-sided datatype check instead of send/receive and check both
the origin and target datatypes
* allow error handler to be set on MPI_WIN_NULL, per standard
* Allow recursive calls into the pt2pt osc component's progress
function
* Fix an uninitialized variable problem in the unlock header
This commit was SVN r12667.
because they are in ORTE, not OMPI. Also, remove the ORTE_PROCESS_NAME macros
in iof base as they are duplicates of the ones that were in ns_types, which
meant that bad things happened if you changed what an orte_process_name_t
looked like.
This commit was SVN r12646.
the same time, remove some of the MPI-related options from OPAL:
- provide mechanism to change at runtime whether sched_yield() should
be called when the progress engine is idle
- provide mechanism for changing the rate at which the event engine
is called when there are "no" users of the event engine (ie, when
using MPI but not TCP)
- fix some function names in the progress engine to better match
their intended use (and remove MPI naming scheme)
- remove progress_mpi_enable / progress_mpi_disable because
we can now use the functions to set the sched_yield and
tick rate interfaces
- rename opal_progress_events() to opal_progress_set_event_flag()
because the first really isn't descriptive of what the function
does and I always got confused by it
This commit was SVN r12645.
Fix comm_spawn by singletons. orte_init does some voodoo to let the system know about localhost when we are a singleton. This includes allocating it so that any comm_spawn'd children can use their parent "allocation". Unfortunately, the fix that bproc needs (due to that smr filling up the node segment!) causes the singleton startup to fail. The fix is to just have the singleton startup force an allocation of its localhost.
Only issue here is: what happens if we are in a persistent universe? The singleton will now overwrite any prior info on slots used on localhost by other jobs (won't affect anything else). The answer, of course, is to do something more intelligent - lookup localhost on the registry and just update its info instead of overwriting it.
Something for another day (or month....or year)
This commit was SVN r12644.
We were burned again by the fact that the bproc state monitor creates entries on the node segment for *all* the nodes in the cluster when it is opened during orte_init. As a result, the bjs allocator was never being called, and the system merrily assumed that *all* nodes in the cluster had been allocated to it.
To fix this, I removed a test that had been inserted into the allocation procedure that checked for a non-zero node segment. This was an old artifact - the RAS components already know that they are not to overwrite any existing node segment entries (at least, bproc does - I will check the others. For now, I just want to save the bproc fix on this machine).
This commit was SVN r12640.
Ensure that the new predefined MPI-2 attribute callback functions take
the proper types (INTEGER, kind=MPI_ADDRESS_KIND instead of just
INTEGER).
This commit was SVN r12639.
The following Trac tickets were found above:
Ticket 624 --> https://svn.open-mpi.org/trac/ompi/ticket/624
Modify the RMAPS framework so we eliminate communicating a map to a backend node when certain attributes are set. The proxy functions are now implemented in the base, and a check made for HNP/non-HNP operation made in the map_jobs function prior to execution.
This commit was SVN r12619.
Add placeholders for the new orte tools. These don't actually do anything yet - in fact, I have set the .ompi_ignore so that you won't compile them (I have set a .ompi_unignore for me). Please let me know if you encounter any trouble with this - the ompi_ignore's should protect everyone.
This commit was SVN r12616.
Note that Bproc won't support this operation, so we just ignore the --reuse-daemons directive.
I'm afraid I don't understand the POE and XGrid environments well enough to attempt the necessary modifications.
Also, please note that XGrid support has been broken on the trunk. I don't understand the code syntax well enough to make the required changes to that PLS component, so it won't compile at the moment. I'm hoping Brian has a few minutes to fix it after SC.
This commit was SVN r12614.