sender piggybacks a number of credit messages it received from a peer. A number
of outstanding credit messages is limited. This is needed to never ever fall
back to HW flow control.
This commit was SVN r15580.
eager RDMA receive path and checks internally from where it was called from to
perform different tasks. Leave only common code in there and move other code
to appropriate places.
This commit was SVN r15579.
* General TCP cleanup for OPAL / ORTE
* Simplifying the OOB by moving much of the logic into the RML
* Allowing the OOB RML component to do routing of messages
* Adding a component framework for handling routing tables
* Moving the xcast functionality from the OOB base to its own framework
Includes merge from tmp/bwb-oob-rml-merge revisions:
r15506, r15507, r15508, r15510, r15511, r15512, r15513
This commit was SVN r15528.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r15506
r15507
r15508
r15510
r15511
r15512
r15513
Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point.
Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings.
This commit was SVN r15517.
It will prevent the error failure in openib finalize
but it doesn't resolve the actual issue. I guess that
oneside tests some how allocates memory (mpool?) and doesn't
release it. Need to check it.
This commit was SVN r15488.
* bml.h had a change that introduced a variable named "_order" to
avoid a conflict with a local variable. The namespace starting
with _ belongs to the os/compiler/kernel/not us. So we can't start
symbols with _. So I replaced it with arg_order, and also updated
the threaded equivalent of the macro that was modified.
* in btl_openib_proc.c, one opal_output accidentally had its string
reverted from "ompi_modex_recv..." to
"mca_pml_base_modex_recv....". This was fixed.
* The change to ompi/runtime/ompi_preconnect.c was entirely
reverted; it was an artifact of debugging.
This commit was SVN r15475.
The following SVN revision numbers were found above:
r15474 --> open-mpi/ompi@8ace07efed
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
switching:
0 0
/ \ \ / \ \
1 \ \ --> 4 \ \
/ \ \ / \ \
3 2 \ 3 2 \
4 1
(duh). The first form is the bmtree suitable for bcast, but the latter is better for reduce.
Updating default decision function accordingly.
This commit was SVN r15422.
instead of just the procs for MCW (in MCW order). Should make resolving
ptl_process_id_t structures for arbitrary communicators easier for
applications that need it.
This commit was SVN r15393.
that exactly describes the buffer to be used as the target of the
operation
* Use the above flag to disable components setting the flag from being
used for real RDMA operations for the one-sided component (the
BTLs will still be used for RDMA transfers for the PML and for
send/receive communication for the OSC component)
This commit was SVN r15375.
have to construct/destruct only once. Therefore, the construction will
happens before digging for a PML, while the destruction just before
finalizing the component.
Add some OPAL_LIKELY/OPAL_UNLIKELY.
This commit was SVN r15347.
receive queues are shared among all PMLs, they are declared in the base PML,
and the selected PML is in charge of initializing and releasing them.
The CM PML is slightly different compared with OB1 or DR. Internally it use
2 different types of requests: light and heavy. However, now with this patch
both types of requests are stored in the same queue, and cast appropriately
on the allocation macro. This means we might use less memory than we allocate,
but in exchange we got full support for most of the parallel debuggers.
Another thing with this patch, is that now for all PML (CM included) the basic
PML requests start with the same fields, and they are declared in the same order
in the request structure. Moreover, the fields have been moved in such a way
that only one volatile/atomic will exist per line of cache (hopefully).
This commit was SVN r15346.
VxWorks. Still some issues remaining, I'm sure.
Refs trac:1010
This commit was SVN r15320.
The following Trac tickets were found above:
Ticket 1010 --> https://svn.open-mpi.org/trac/ompi/ticket/1010
than just the PML/BTLs these days. Also clean up the code so that it
handles the situation where not all nodes register information for a given
node (rather than just spinning until that node sends information, like
we do today).
Includes r15234 and r15265 from the /tmp/bwb-modex branch.
This commit was SVN r15310.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r15234
r15265
The problem is that in the case of threaded builds for every fifo
a head and tail lock will be allocated inside the shared memory
segment and the ptr is stored inside the fifo. In the case that the sm backend
file will be mapped in all processes at the same address (mostly the
case for non-thread builds) this is fine, but in the cases when the
processes map the file at different addresses this addresses cause big
trouble in other processes than the one that allocted the locks.
Therefore the send lock addresses have to be recalculated to match
the local mapping of the processes that use them.
This commit was SVN r15291.
relative bandwidths of each BTL. Precalculate what part of a message should
be send via each BTL in advance instead of doing it during scheduling.
This commit was SVN r15248.
each BTL. Precalculate what part of a message should be send via each BTL in
advance instead of doing it during scheduling.
This commit was SVN r15247.
argument to the query for the line speed. This function is still not
documented, and it really look strange that we have to respecify the
nic_id (it's already attached to the endpoint).
This commit was SVN r15241.
* Fix potential race condition with starting a new lock epoch if we
were releasing a lock
* Increment the shared counter if we start a shared lock session during
the unlock code
This commit was SVN r15186.
a WIN_FREE
* Fix race condition in threaded builds with pending unlocks and
finishing an epoch
* Fix memory leak due to use of OBJ_DESTRUCT instead of OBJ_RELEASE
* Fix race condition between releasing multiple shared locks and
starting a new lock
* Need to incremement the shared count if starting a new shared
lock once an exclusive lock finishes
This commit was SVN r15185.
OBJ_NEW
* Need to single when the passive unlock has left an expose epoch for
the win_free case
* Clean up some debugging output
* fix missing variable initialization
This commit was SVN r15167.
flex (which, incidentally, emit ''more'' warnings than earlier
versions). Grumble.
This commit was SVN r15166.
The following SVN revision numbers were found above:
r15158 --> open-mpi/ompi@57d09c10f7
- adding linear algorithm with synchronization for gather.
This algorithm prevents congestion at root process, but introduces
synchronization (serializes non-root processes, but allows messages
to arrive from two processes at the same time).
It performed better than binomial and linear algorithms for large message,
and intermediate and large communicator sizes.
- Updating MPI_Gather decision function to reflect performance results
from MX. I will perform more measurements though - so this one can
change.
This commit was SVN r15165.
reason it's that we don't have the nice configure stuff, so detecting
when to enable the CR PML it's kind of hard. Keep it defined and at
least it compile smoothly.
This commit was SVN r15116.
branch:
* Support btl_openib_if_include and btl_openib_if_exclude MCA
parameters, similar to those supported by other BTLs. Each take a
comma-delimited lists of identifiers. Identifiers can be HCA
interface names (e.g., ipath0, mthca1, etc.) or an HCA interface
name and port numbers (e.g., ipath0:1, mthca1:2, etc.). It is an
error to specify both _include and _exclude. If you specify a
non-existant (or non-ACTIVE) HCA and/or port, you'll get a warning
unless you disable the warning by setting the MCA parameter
btl_openib_warn_nonexistent_if to 0.
* Start updating to use BEGIN_C_DECLS and END_C_DECLS
* A few other minor fixes that were picked up along the way.
This commit was SVN r15063.
Set bandwidth for all ports of mthca0:
--mca btl_openib_bandwidth_mthca0 1000
Set bandwidth for port 1 of mthca1:
--mca btl_openib_bandwidth_mthca1:1 1000
Set latency for port 2 lid 123 on mthca0:
--mca btl_openib_latency_mthca0:2:123 20
This commit was SVN r15041.
even look at the status code and basically guarantee that the aio
function was never called, so there's really no point in AC_TRY_RUN
over AC_COMPILE_IFELSE...
This commit was SVN r15033.
single threaded builds. In its default configuration, all this does
is ensure that there's at least a good chance of threads building
based on non-threaded development (since the variable names will be
checked). There is also code to make sure that a "mutex" is never
"double locked" when using the conditional macro mutex operations.
This is off by default because there are a number of places in both
ORTE and OMPI where this alarm spews mega bytes of errors on a
simple test. So we have some work to do on our path towards
thread support.
Also removed the macro versions of the non-conditional thread locks,
as the only places they were used, the author of the code intended
to use the conditional thread locks. So now you have upper-case
macros for conditional thread locks and lowercase functions for
non-conditional locks. Simple, right? :).
This commit was SVN r15011.
structures in the system. Instead of using memcmp, use the ns function.
This won't cause a problem as long as all three elements of the name are
ints, but if they have different sizes, alignment and padding rules
can cause memcmp() to compare padding space, which rarely holds a sane
value.
This commit was SVN r14998.
id based on the last half of the mapper MAC. This allow us to figure out how
to connect peers. This allow the MX BTL to be used in a cluster of cluster
configuration where each cluster have MX internally as well as on a multi
rail MX system.
This commit was SVN r14932.
symbols in them and environ is defined only in the final application
(probably in crt1.o). Apple provides a function for getting at the
environment, so use that instead if it's available.
This commit was SVN r14857.
have the SRQ interface.
* Instead of setting AC_DEFINEs per MCA component, set per test. THe
answers can never be difference, and this will speed sed just a teeny
bit
This commit was SVN r14856.