return OMPI_ERR_UNREACH if the port returns an invalid speed or
width. OMPI_ERR_VALUE_OUT_OF_BOUNDS is reserved for when we exceed
the number of allowable BTLs.
This commit was SVN r17500.
We loop over all peer addresses and accept when one of them matches.
Note that this might break functionality: mca_btl_tcp_proc_insert now
always inserts the same endpoint. (is the lack of endpoints the problem?
should there be one for every remote address?)
Re #1206
This commit was SVN r17307.
- the registration array is now global instead of one by BTL.
- each framework have to declare the entries in the registration array reserved. Then
it have to define the internal way of sharing (or not) these entries between all
components. As an example, the PML will not share as there is only one active PML
at any moment, while the BTLs will have to. The tag is 8 bits long, the first 3
are reserved for the framework while the remaining 5 are use internally by each
framework.
- The registration function is optional. If a BTL do not provide such function,
nothing happens. However, in the case where such function is provided in the BTL
structure, it will be called by the BML, when a tag is registered.
Now, it's time for the second step... Converting OB1 from a switch based PML to an
active message one.
This commit was SVN r17140.
for dynamic selection of cpc methods based on what is available. It
also allows for inclusion/exclusions of methods. It even futher allows
for modifying the priorities of certain cpc methods to better determine
the optimal cpc method.
This patch also contains XRC compile time disablement (per Jeff's
patch).
At a high level, the cpc selections works by walking through each cpc
and allowing it to test to see if it is permissable to run on this
mpirun. It returns a priority if it is permissable or a -1 if not. All
of the cpc names and priorities are rolled into a string. This string
is then encapsulated in a message and passed around all the ompi
processes. Once received and unpacked, the list received is compared
to a local copy of the list. The connection method is chosen by
comparing the lists passed around to all nodes via modex with the list
generated locally. Any non-negative number is a potentially valid
connection method. The method below of determining the optimal
connection method is to take the cross-section of the two lists. The
highest single value (and the other side being non-negative) is selected
as the cpc method.
svn merge -r 16948:17128 https://svn.open-mpi.org/svn/ompi/tmp-public/openib-cpc/ .
This commit was SVN r17138.
header as double-word aligned and prevents bus errors on SPARC
based servers. This is part of fix for #1148.
Refs trac:1148
This commit was SVN r17090.
The following Trac tickets were found above:
Ticket 1148 --> https://svn.open-mpi.org/trac/ompi/ticket/1148
high prio QPs and low prio QPs) and because not all of them are polled each time
progrgess() is called (to save on latency) starvation is possible. The commit
fixes this. Now each channel is polled, but higher priority channels are polled
more often. Three new parameters are introduced that control polling ratios
between different channels.
This commit was SVN r17024.
(sometimes after the merge with the ORTE branch), the opal_pointer_array
will became the only pointer_array implementation (the orte_pointer_array
will be removed).
This commit was SVN r17007.
so that it is higher than the new TCP BTL exclusivity as of r16942.
The portals BTL maintainer may want to do the same...
This commit was SVN r16995.
The following SVN revision numbers were found above:
r16942 --> open-mpi/ompi@80e9730100
about linkers, have all OPAL, ORTE, and OMPI components '''not'' link
against the OPAL, ORTE, or OMPI libraries.
See ttp://www.open-mpi.org/community/lists/users/2007/10/4220.php for
details (or https://svn.open-mpi.org/trac/ompi/wiki/Linkers for a
better-formatted version of the same info).
This commit was SVN r16968.
should be passed via commandline. However, there is a slight coding
bug in the openib connect code. When registering the name of the
option, mca_base_param_reg_string will prepend the relevant info
("btl_openib_" in this case). The existing code will require
"btl_openib_btl_openib_connect" instead of "btl_openib_connect".
This patch corrects this.
This commit was SVN r16937.
smaller then allocated size.
2. If reserve zero don't allocate coalesced frag since it will be RDMAed, not
send. The logic was other way around.
This commit was SVN r16928.
mca_btl_openib_mca_setup_qps(). It looks like someone just forgot to
clean-up the previous call when they added the check for the return
code.
I ran a quick IMB test over IB to verify everything is still working.
This commit was SVN r16870.
parameters don't make any sense. Credits are never piggybacked. Also make
default queue sizes to be calculated from eager_limit and max_send_size values.
This commit was SVN r16816.
needed instead of creating it and then canceling if it is not needed. Change
error handling during finalize so that it will not skip async thread
destruction. Otherwise async thread may segfault during openib module unloading.
This commit was SVN r16782.
to a pending queue of eager rdma QP instead of correct pending list. This patch
fixes this by getting reed of "eager rdma qp" notion. Packet is always send
over its order QP. The patch also adds two pending queues for high and low prio
packets. Only high prio packets are sent over eager RDMA channel.
This commit was SVN r16780.
main idea (except of cleanup) is to save on initialisation of unneeded fields
and to use C type checking system to catch obvious errors.
This commit was SVN r16779.
has his own range which is defined by a min value and a range. By default
there is no limitation on the port range, which is exactly the same
behavior as before.
This commit was SVN r16584.
the use of the --mca btl_base_verbose flag. The
btl framework now matches all the other frameworks.
Slightly modify error messages for clarity.
This commit was SVN r16443.
* Fix some missing includes in a few places.
* Add the cr_request() functionality to the BLCR CRS component.
We are now dependent upon the 0.6.* series of BLCR.
* Made the CR notification mechanism a registered function.
This way we can have an OPAL-only version and it can be replaced at
runtime with the ORTE version.
* Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only
CR functionality when the user wants it. Default: Disabled.
* Fix the placement of a checkpoint request check in MPI_Init
* Pull the OPAL notification mechanism into the SnapC framework.
* We no longer fork/exec the 'opal-checkpoint' command for local
checkpointing, the Local coordinator in the orted does this directly.
* The Local and Application coordinator talk together bypassing the OPAL
notifiation mechanism.
* Optimized the Local <-> App Coordinator communication.
* Improved the structure used to track vpid_snapshots in the local coord.
* Fix a race condition in which an application under heavy communication load
may produce an inconsistent global checkpoint.
This commit was SVN r16389.
Check if an exclusion string (i.e. '-mca btl ^sm) was provided; if so OFUD just disables itself.
This commit was SVN r16307.
The following Trac tickets were found above:
Ticket 1154 --> https://svn.open-mpi.org/trac/ompi/ticket/1154
Each one of them has a field to store QP type, but this is redundant.
Store qp type only in one structure (the component one).
This commit was SVN r16272.
meaning "infinite") is no longer larger than the minimum required
size. So put in an appropriate test to ensure that "infinite" was not
requested.
This commit was SVN r16142.
you'll get a helpful error message and the openib BTL will deactivate
itself.
This commit was SVN r16133.
The following Trac tickets were found above:
Ticket 1133 --> https://svn.open-mpi.org/trac/ompi/ticket/1133
Basically revert this part of r16015.
This commit was SVN r16029.
The following SVN revision numbers were found above:
r16015 --> open-mpi/ompi@435e7d80e9
the ompi_convertor_need_buffers function to only return 0 if the convertor
is homogeneous (which it never does on the trunk, but does to on v1.2, but
that's a different issue). Only enable the heterogeneous rdma code for
a btl if it supports it (via a flag), as some btls need some work for this
to work properly. Currently only TCP and OpenIB extensively tested
This commit was SVN r15990.
to always first check for a NULL frag pointer before trying to send the
fragment. This avoids an issue in multi-threaded execution in which
multiple threads working on the same endpoint can result in a thread
finding itself here with nothing to send.
This commit was SVN r15963.
one HCA. Multiple ports, LMC, multiple BTLs per one LID. Having only one CQ for
all of them substantially reduce polling time.
This commit was SVN r15933.
semicolons but the new specitifcation string used colons. The text
parser now looks for colons.
* Changed all opal_output() error messages to
much-more-helpful/descriptive opal_show_help() messages.
* A few minor style/indenting fixes
This commit was SVN r15850.
The following SVN revision numbers were found above:
r15848 --> open-mpi/ompi@dd30597f39
/tmp/jms-modular-wireup branch):
* This commit moves all the openib BTL connection code out of
btl_openib_endpoint.c and into a connect "pseudo-component" area,
meaning that different schemes for doing OFA connection schemes can
be chosen via function pointer (i.e., MCA parameter) at run-time.
* The connect/connect.h file includes comments describing the
specific interface for the connect pseudo-component.
* Two pseudo-components are in this commit (more can certainly be
added).
* oob: use the same old oob/rml scheme for creating OFA connections
that we've had forever; this now just puts the logic into this
self-contained pseudo-component.
* rdma_cm: a currently-empty set of functions (that currently
return NOT_IMPLEMENTED) that will someday use the RDMA connection
manager to make OFA connections.
This commit was SVN r15786.
This mpool will have no btl module owner there was no btl created for
the HCA with no ports, but it will still be tracked in the mpool
framework (i.e., it's available).
If MPI_ALLOC_MEM is called by the app, one of two things will happen:
1. if there's an HCA on the host with some active ports, the openib
btl component will still be in the process space, and therefore
the "mpool with no btl" (MWNB) module will still be able to call
the reg/dereg functions, and all will be fine. However, if
MPI_FREE_MEM is never invoked to free the memory, bad things will
happen during MPI_FINALIZE. The pml is finalized, which finalizes
all the btls. The btls finalize all their mpools and all is fine.
But later we close down the mpool framework which then finalizes
any left over mpool modules, such as MWNB. However, the openib
BTL module functions that the MWNB was registered with are no
longer in the process space, and it segv's while trying deregister
the memory.
2. if there are *no* HCA's on the host with active ports, then the
openib btl will have been unloaded, and when the MWNM tries to
register the memory, the functions it tries to call (in the openib
btl) are no longer there, and we segv.
This commit was SVN r15735.
it to at least compile. If you actually get to the point of invoking
the openib btl progress thread, you'll get a big opal_output warning
that it is pretty much guaranteed not to work.
This commit was SVN r15628.
sender piggybacks a number of credit messages it received from a peer. A number
of outstanding credit messages is limited. This is needed to never ever fall
back to HW flow control.
This commit was SVN r15580.
eager RDMA receive path and checks internally from where it was called from to
perform different tasks. Leave only common code in there and move other code
to appropriate places.
This commit was SVN r15579.
Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point.
Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings.
This commit was SVN r15517.
It will prevent the error failure in openib finalize
but it doesn't resolve the actual issue. I guess that
oneside tests some how allocates memory (mpool?) and doesn't
release it. Need to check it.
This commit was SVN r15488.
* bml.h had a change that introduced a variable named "_order" to
avoid a conflict with a local variable. The namespace starting
with _ belongs to the os/compiler/kernel/not us. So we can't start
symbols with _. So I replaced it with arg_order, and also updated
the threaded equivalent of the macro that was modified.
* in btl_openib_proc.c, one opal_output accidentally had its string
reverted from "ompi_modex_recv..." to
"mca_pml_base_modex_recv....". This was fixed.
* The change to ompi/runtime/ompi_preconnect.c was entirely
reverted; it was an artifact of debugging.
This commit was SVN r15475.
The following SVN revision numbers were found above:
r15474 --> open-mpi/ompi@8ace07efed
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
that exactly describes the buffer to be used as the target of the
operation
* Use the above flag to disable components setting the flag from being
used for real RDMA operations for the one-sided component (the
BTLs will still be used for RDMA transfers for the PML and for
send/receive communication for the OSC component)
This commit was SVN r15375.
VxWorks. Still some issues remaining, I'm sure.
Refs trac:1010
This commit was SVN r15320.
The following Trac tickets were found above:
Ticket 1010 --> https://svn.open-mpi.org/trac/ompi/ticket/1010
than just the PML/BTLs these days. Also clean up the code so that it
handles the situation where not all nodes register information for a given
node (rather than just spinning until that node sends information, like
we do today).
Includes r15234 and r15265 from the /tmp/bwb-modex branch.
This commit was SVN r15310.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r15234
r15265
The problem is that in the case of threaded builds for every fifo
a head and tail lock will be allocated inside the shared memory
segment and the ptr is stored inside the fifo. In the case that the sm backend
file will be mapped in all processes at the same address (mostly the
case for non-thread builds) this is fine, but in the cases when the
processes map the file at different addresses this addresses cause big
trouble in other processes than the one that allocted the locks.
Therefore the send lock addresses have to be recalculated to match
the local mapping of the processes that use them.
This commit was SVN r15291.
argument to the query for the line speed. This function is still not
documented, and it really look strange that we have to respecify the
nic_id (it's already attached to the endpoint).
This commit was SVN r15241.
flex (which, incidentally, emit ''more'' warnings than earlier
versions). Grumble.
This commit was SVN r15166.
The following SVN revision numbers were found above:
r15158 --> open-mpi/ompi@57d09c10f7
branch:
* Support btl_openib_if_include and btl_openib_if_exclude MCA
parameters, similar to those supported by other BTLs. Each take a
comma-delimited lists of identifiers. Identifiers can be HCA
interface names (e.g., ipath0, mthca1, etc.) or an HCA interface
name and port numbers (e.g., ipath0:1, mthca1:2, etc.). It is an
error to specify both _include and _exclude. If you specify a
non-existant (or non-ACTIVE) HCA and/or port, you'll get a warning
unless you disable the warning by setting the MCA parameter
btl_openib_warn_nonexistent_if to 0.
* Start updating to use BEGIN_C_DECLS and END_C_DECLS
* A few other minor fixes that were picked up along the way.
This commit was SVN r15063.
Set bandwidth for all ports of mthca0:
--mca btl_openib_bandwidth_mthca0 1000
Set bandwidth for port 1 of mthca1:
--mca btl_openib_bandwidth_mthca1:1 1000
Set latency for port 2 lid 123 on mthca0:
--mca btl_openib_latency_mthca0:2:123 20
This commit was SVN r15041.