fact a free_list_item so instead of having a struct, use typedef
to make them equivalent. Modify the parallel debuggers support
in order to allow them access to the internal types even when
we have an optimized build.
This commit was SVN r16567.
If we cannot resolve the route to the peer that we're trying to send
to, don't queue up the message in the TCP OOB -- instead, return it to
the upper layer (e.g., the RML) and let it decide what to do.
In the case of the routed RML, the tree component will queue it up for
later transmission. Hence, we don't want the message queued up both
here in the TCP OOB and the tree routed. Also see some more
discussion / explanation in #1171.
This commit was SVN r16540.
The following SVN revision numbers were found above:
r16513 --> open-mpi/ompi@7ae9589d70
The following Trac tickets were found above:
Ticket 1170 --> https://svn.open-mpi.org/trac/ompi/ticket/1170
Change order of initialization as in the declaration
Add missing initialization of req_persistent and req_mpi_object
to ompi_request_empty and ompi_request_null.
This commit was SVN r16536.
such a way that converter will not be able to pack some of it. This commit adds
handling of such cases. If converter can't pack any data for a BTL the data is
sent over another BTL that has data to send.
This commit was SVN r16493.
Note that this means ALL procs in the parent job are updated, even though they may not be participating in the comm_spawn. This doesn't really hurt anything - just unnecessary.
Comm_spawn still has a problem when a child process shares a node with a parent, so this doesn't fix everything. It only fixes the bug of ensuring all procs know how to talk to each other.
This commit was SVN r16460.
This commit introduces the necessary logic to avoid that conflict. If a PLS component can identify that a daemon has failed, then we will set a flag indicating that fact. The xcast system will subsequently check that flag and, if it is set, will send all messages direct to the recipient. In the case of "kill local procs" and "terminate", the messages will go directly to each orted, thus bypassing any orted that has failed.
In addition, the xcast system will -not- wait for the messages to complete, but will return immediately (i.e., operate in non-blocking mode). Orterun will wait (via an event timer) for a period of time based on the number of daemons in the system to allow the messages to attempt to be delivered - at the end of that time, orterun will simply exit, alerting the user to the problem and -strongly- recommending they run orte-clean.
I could only test this on slurm for the case where all daemons unexpectedly died - srun apparently only executes its waitpid callback when all launched functions terminate. I have asked that Jeff integrate this capability into the OOB as he is working on it so that we execute it whenever a socket to an orted is unexpectedly closed. Meantime, the functionality will rarely get called, but at least the logic is available for anyone whose environment can support it.
This commit was SVN r16451.
the use of the --mca btl_base_verbose flag. The
btl framework now matches all the other frameworks.
Slightly modify error messages for clarity.
This commit was SVN r16443.
config/ompi_check_visibility.m4 (OMPI_CHECK_VISIBILITY):
Rename ompi_vc_cc_fvisibility to ompi_cv_cc_fvisibility, so
that it will be cached.
This commit was SVN r16435.
variable is not defined. Make sure to set it to something reasonable
so that file preloading still works (instead of seg faulting :)
Thanks to Hiep Bui Hoang for reporting this bug.
This commit was SVN r16433.
For example, if I have an application that, internal to the application, takes
the argument '-mca foo bar' we do not want orterun to pick up this argument and
pass it through the system.
So the following
{{{
shell$ mpirun -np 2 -mca btl tcp,self ./myapp -mca foo bar
}}}
orterun should pick up {{{-mca btl tcp,self}}} but not {{{-mca foo bar}}} which
it was previous to this commit.
I tested command line runs and runs with app files to confirm this patch works.
This commit was SVN r16431.