The fix is to just check if the return value is positive or not, since all the SOS encoded errors are *always* negative.
The real fix (as Ralph points out) is to change these functions (opal_pointer_array_add and mca_base_param*) to return the index as a pointer.
This commit was SVN r23173.
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
back the native error code.
* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
(OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
decode 'ret' to get the native error code.
This commit was SVN r23162.
- Delete unnecessary header files using
contrib/check_unnecessary_headers.sh after applying
patches, that include headers, being "lost" due to
inclusion in one of the now deleted headers...
In total 817 files are touched.
In ompi/mpi/c/ header files are moved up into the actual c-file,
where necessary (these are the only additional #include),
otherwise it is only deletions of #include (apart from the above
additions required due to notifier...)
- To get different MCAs (OpenIB, TM, ALPS), an earlier version was
successfully compiled (yesterday) on:
Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled
This commit was SVN r21096.
1. register "mpi_preconnect_all" as a deprecated synonym for "mpi_preconnect_mpi"
2. remove "mpi_preconnect_oob" and "mpi_preconnect_oob_simultaneous" as these are no longer valid.
3. remove the routed framework's "warmup_routes" API. With the removal of the direct routed component, this function at best only wasted communications. The daemon routes are completely "warmed up" during launch, so having MPI procs order the sending of additional messages is simply wasteful.
4. remove the call to orte_routed.warmup_routes from MPI_Init. This was the only place it was used anyway.
The FAQs will be updated to reflect this changed situation, and a CMR filed to move this to the 1.3 branch.
This commit was SVN r19933.
Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed.
This commit was SVN r18664.
that it is >= 1, so making it a size_t makes it easier to interact
with all the other size_t variables and removes a compiler warning.
This commit was SVN r15935.
mpi_preconnect_oob_simultaneous > np. Need to scale back
simultaneous to equal np in those cases. Reviewed by Brian.
This commit fixes trac:1064.
This commit was SVN r15916.
The following Trac tickets were found above:
Ticket 1064 --> https://svn.open-mpi.org/trac/ompi/ticket/1064
* bml.h had a change that introduced a variable named "_order" to
avoid a conflict with a local variable. The namespace starting
with _ belongs to the os/compiler/kernel/not us. So we can't start
symbols with _. So I replaced it with arg_order, and also updated
the threaded equivalent of the macro that was modified.
* in btl_openib_proc.c, one opal_output accidentally had its string
reverted from "ompi_modex_recv..." to
"mca_pml_base_modex_recv....". This was fixed.
* The change to ompi/runtime/ompi_preconnect.c was entirely
reverted; it was an artifact of debugging.
This commit was SVN r15475.
The following SVN revision numbers were found above:
r15474 --> open-mpi/ompi@8ace07efed
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
wireup. For small clusters or clusters with decent ARP lookup and
connect times, this will have marginal impact. For systems with either
bad ARP lookup times or long connect times, increasing this number
to something much closer to SOMAXCONN (128 on most modern machines) will
result in a faster OOB wireup. Don't set higher than SOMAXCONN or you
can end up with lots of connect() retries and we'll end up slower.
This commit was SVN r14742.
* Don't need the 2 process case -- we'll send an extra message, but
at very little cost and less code is better.
* Use COMPLETE sends instead of STANDARD sends so that the connection
is fully established before we move on to the next connection. The
previous code was still causing minor connection flooding for huge
numbers of processes.
* mpi_preconnect_all now connects both OOB and MPI layers. There's
also mpi_preconnect_mpi and mpi_preconnect_oob should you want to
be more specific.
* Since we're only using the MCA parameters once at the beginning
of time, no need for global constants. Just do the quick param
lookup right before the parameter is needed. Save some of that
global variable space for the next guy.
Fixes trac:963
This commit was SVN r14553.
The following Trac tickets were found above:
Ticket 963 --> https://svn.open-mpi.org/trac/ompi/ticket/963
support SEND_IN_PLACE causes badness because the BTL tries to use the
not-exactly-complete convertor. Don't need it in this situation anyway.
This commit was SVN r13700.
CM PML. The req_mtl structure is cast into a mtl_*_request_structure for
each MTL, which is larger than the req_mtl itself. The cast will cause
the *_request to overwrite parts of the heavy requests if the req_mtl
isn't the *LAST* thing on each structure (hence the comment). This was
moved as an optimization at some point, which caused buffer sends to
fail...
Refs trac:669
This commit was SVN r12871.
The following Trac tickets were found above:
Ticket 669 --> https://svn.open-mpi.org/trac/ompi/ticket/669
r10877:
add warm up connection option.. of course this only warms up the first
eager btl but this should be adequate for now..
r10881:
Consulted with Galen and did a few things:
- Fix the algorithm to actually make the connections that we want
- Rename the MCA param to mpi_preconnect_all
- Cleanup the code a bit:
- move the logic to a separate .c file
- check return codes properly
This commit was SVN r11114.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r10877
r10877
r10881
r10881