Commit from a long-standing Mercurial tree that ended up incorporating a lot of things:
* A few fixes for CPC interface changes in all the CPCs
* Attempts (but not yet finished) to fix shutdown problems in the IB CM CPC
* #1319: add CTS support (i.e., initiator guarantees to send first message; automatically activated for iWARP over the RDMA CM CPC)
* Some variable and function renamings to make this be generic (e.g., alloc_credit_frag became alloc_control_frag)
* CPCs no longer post receive buffers; they only post a single receive buffer for the CTS if they use CTS. Instead, the main BTL now posts the main sets of receive buffers.
* CPCs allocate a CTS buffer only if they're about to make a connection
* RDMA CM improvements:
* Use threaded mode openib fd monitoring to wait for for RDMA CM events
* Synchronize endpoint finalization and disconnection between main thread and service thread to avoid/fix some race conditions
* Converted several structs to be OBJs so that we can use reference counting to know when to invoke destructors
* Make some new OBJ's have opal_list_item_t's as their base, thereby eliminating the need for the local list_item_t type
* Renamed many variables to be internally consistent
* Centralize the decision in an inline function as to whether this process or the remote process is supposed to be the initiator
* Add oodles of OPAL_OUTPUT statements for debugging (hard-wired to output stream -1; to be activated by developers if they want/need them)
* Use rdma_create_qp() instead of ibv_create_qp()
* openib fd monitoring improvements:
* Renamed a bunch of functions and variables to be a little more obvious as to their true function
* Use pipes to communicate between main thread and service thread
* Add ability for main thread to invoke a function back on the service thread
* Ensure to set initiator_depth and responder_resources properly, but putting max_qp_rd_ataom and ma_qp_init_rd_atom in the modex (see rdma_connect(3))
* Ensure to set the source IP address in rdma_resolve() to ensure that we select the correct OpenFabrics source port
* Make new MCA param: openib_btl_connect_rdmacm_resolve_timeout
* Other improvements:
* btl_openib_device_type MCA param: can be "iw" or "ib" or "all" (or "infiniband" or "iwarp")
* Somewhat improved error handling
* Bunches of spelling fixes in comments, VERBOSE, and OUTPUT statements
* Oodles of little coding style fixes
* Changed shutdown ordering of btl; the device is now an OBJ with ref counting for destruction
* Added some more show_help error messages
* Change configury to only build IBCM / RDMACM if we have threads (because we need a progress thread)
This commit was SVN r19686.
The following Trac tickets were found above:
Ticket 1210 --> https://svn.open-mpi.org/trac/ompi/ticket/1210
Refers to ticket #1548. Although this appears to fix the problem, the ticket will be held open pending further test prior to transition to the 1.3 branch.
This commit was SVN r19674.
got a whole lot smaller, decreasing the memory footprint of the
running application. How much it's a good question. Here is a
breakdown:
- in mca_bml_base_endpoint_t: 3 *size_t + 1 * uint32_t
- in mca_bml_base_btl_t: 1 * int + 1 * double - 1 * float
+ 6 * size_t + 9 * (void*)
The decrease in mca_bml_base_endpoint_t is for each peer and the
decrease in mca_bml_base_btl_t is for each BTL for each peer.
So, if we consider the most convenient case where there is only
one network between all peers, this decrease the memory foot print
per peer by
9*size_t + 9*(void*) + 2 * int32_t + 1 * double - 1 * float.
On a 64 bits machine this will be 156 bytes per peer.
Now we access all these fields directly from the underlying BTL
structure, and as this structure is common to multiple BML endpoint,
we are a lot more cache friendly. Even if this do not improve the
latency, it makes the SM performance graph a lot smoother.
This commit was SVN r19659.
There was an argument that was barely used, and on return at the PML
level it contained nothing usable. It has been removed, so now we're
using less memory ...
This commit was SVN r19657.
(related to the presence of posix threads and ptmalloc2) is now a
little outdated: since we don't build ptmalloc2 as part of libopal
anymore, the openib BTL's requirements are not directly tied to
ptmalloc2's anymore. Specifically, I altered the test to:
1. At compile time, if no threads are found, the ptmalloc2 component
is going to be built, '''and the ptmalloc2 component is going to be
inside libopal,''' then refuse to build the openib BTL.
1. At run time, if no threads were available at compile time and the
ptmalloc2 component is part of the process, then refuse to use the
openib BTL.
Fixes trac:1537.
This commit was SVN r19652.
The following Trac tickets were found above:
Ticket 1537 --> https://svn.open-mpi.org/trac/ompi/ticket/1537
always in a heterogeneous way in order to be able to support extern32. It
doesn't really matter as it is outside the critical path.
This commit was SVN r19651.
thereby runs apstat twice; and in the process thereof reads the ALPS
appinfo file TWICE; and in addition, experiences a failure sometimes
which causes mpirun to hang. Change this to a looped read attempt
that breaks on success, thereby avoiding failure (except in the most
This commit was SVN r19642.
Rationale:
1. This value has already changed since v1.2 (v1.2 MPI_MAX_PORT_NAME
== 36). Hence, this commit simply increases the value from a
previous change.
1. The changes does increase OMPI's memory footprint slightly, but
only when using MPI-2 dynamics. So it is expected that the change
will have minimal impact on the overall footprint.
1. The change is helpful for nodes that have 4 or more IP networks
(e.g., regular ethernet and multiple IP-over-<pick your favorite
high-speed network> networks). Without this change, invoking
MPI_COMM_SPAWN on hosts with 4 or more IP networks will fail
because we'll exceed 256 bytes for the port name. Some OMPI
developer test clusters already have this kind of configuration
(e.g., Cisco); it is expected that this is not too common in the
real world yet, but with "manycore" coming, having multiple
IP-based networks in a single server will likely become more
common.
This commit was SVN r19638.
we already have them in orte_process_info. Refs trac:1523.
This commit was SVN r19615.
The following Trac tickets were found above:
Ticket 1523 --> https://svn.open-mpi.org/trac/ompi/ticket/1523
OMPI trunk. Need all organizations to ensure I got spellings and
affiliations correct.
Also commit a helper script to help keep AUTHORS up to date on the
trunk; it should be run before we create release branches.
This commit was SVN r19612.
help messages so that users only see the message once instead of N
times when their MPI app crashes.
Note that there is a tradeoff here -- we now call malloc in this
particular "show the error" code path. This shouldn't usually be a
problem, because the errors typically displayed through this mechanism
are MPI API argument problems (e.g., sending a negative count to
MPI_SEND), and not memory errors. But such API argument errors could
be a consequence of of a prior memory error, so there's a nonzero
chance that the error failure will fail to print because malloc
failed. In this case, the user can disable help message aggregation
(via the orte_base_want_aggregate MCA parameter) and we'll fall back
to the no-malloc code path (but without aggregation).
Note that we won't aggregate before MPI_INIT or after MPI_FINALIZE.
So if you call an MPI function before MPI_INIT / after MPI_FINALIZE,
you'll still see the error message N times. Nothing we can do about
that; we need ORTE to do the aggregation properly (which is obviously
unavailable before MPI_INIT / after MPI_FINALIZE).
This commit was SVN r19611.