1
1

72 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
8ace07efed This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
   BTL.
1. Pasha's new implementation of asychronous HCA event handling.

Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.  

Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes).  :-(

== Fine-grain control of queue pair resources ==

Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).

Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments.  When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers.  One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.

The new design allows multiple QPs to be specified at runtime.  Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified.  The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:

{{{
-mca btl_openib_receive_queues \
     "P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}

Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma).  The above
example therefore describes 4 QPs.

The first QP is:

    P,128,16,4

Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes).  The third field indicates the number of receive buffers
to allocate to the QP (16).  The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).

The second QP is:

    S,1024,256,128,32

Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP.  The second, third and fourth fields are the same as in the
per-peer based QP.  The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32).  This provides a
"good enough" mechanism of flow control for some regular communication
patterns.

QPs MUST be specified in ascending receive buffer size order.  This
requirement may be removed prior to 1.3 release.

This commit was SVN r15474.
2007-07-18 01:15:59 +00:00
Jeff Squyres
1e18265c16 Bring over the functionality from the /tmp/jnysal-openib-wireup
branch:

 * Support btl_openib_if_include and btl_openib_if_exclude MCA
   parameters, similar to those supported by other BTLs.  Each take a
   comma-delimited lists of identifiers.  Identifiers can be HCA
   interface names (e.g., ipath0, mthca1, etc.)  or an HCA interface
   name and port numbers (e.g., ipath0:1, mthca1:2, etc.).  It is an
   error to specify both _include and _exclude.  If you specify a
   non-existant (or non-ACTIVE) HCA and/or port, you'll get a warning
   unless you disable the warning by setting the MCA parameter
   btl_openib_warn_nonexistent_if to 0.
 * Start updating to use BEGIN_C_DECLS and END_C_DECLS
 * A few other minor fixes that were picked up along the way.

This commit was SVN r15063.
2007-06-14 01:59:25 +00:00
Galen Shipman
5340f5e320 Try to cleanup the flow control logic a bit
Renamed a few variables 
Inialize the reserve receive buffers to 1, prior to this they were initialized
to zero. 

This commit was SVN r14919.
2007-06-06 18:51:09 +00:00
Galen Shipman
3401bd2b07 Add optional ordering to the BTL interface.
This is required to tighten up the BTL semantics. Ordering is not guaranteed,
but, if the BTL returns a order tag in a descriptor (other than
MCA_BTL_NO_ORDER) then we may request another descriptor that will obey
ordering w.r.t. to the other descriptor.


This will allow sane behavior for RDMA networks, where local completion of an
RDMA operation on the active side does not imply remote completion on the
passive side. If we send a FIN message after local completion and the FIN is
not ordered w.r.t. the RDMA operation then badness may occur as the passive
side may now try to deregister the memory and the RDMA operation may still be
pending on the passive side. 

Note that this has no impact on networks that don't suffer from this
limitation as the ORDER tag can simply always be specified as
MCA_BTL_NO_ORDER.

This commit was SVN r14768.
2007-05-24 19:51:26 +00:00
Pavel Shamis
e2d0e27111 Adding:
* openib_finalize flow for openib btl
* async event handler for openib btl

This commit was SVN r14623.
2007-05-08 21:47:21 +00:00
Rainer Keller
1aceece03f - Add a few comments for elements for structs, a few spelling fixes.
No functional change.

This commit was SVN r14534.
2007-04-26 21:03:38 +00:00
Rainer Keller
ce32b918da - Fixes for for unlocking the mutex in case of error in functions
mca_btl_openib_post_srr and
     btl_openib_endpoint_post_rr

This commit was SVN r14530.
2007-04-26 13:33:02 +00:00
Sharon Melamed
cf3f41288b Add pkey value MCA parameter. if this param is used,
only ports with the actual pkey value will be initiate.

This commit was SVN r14463.
2007-04-22 10:22:12 +00:00
Jeff Squyres
0ba47105ed Merge the /tmp/jms-installdirs-trunk branch into the trunk. This
finally brings in functionality that is already on the 1.2 branch, and
was developed and tested in the v1.2ofed branch (and other places).

Short version of new features:

 * Support for ibv_fork_init() 
 * Automatically fill in the openib BTL bandwidth value by 
   querying the HCA port 
 * Installdirs functionality 
 * Fixes to always use -I in the Fortran wrapper compilers (#924) 
 * Gleb's mpool updates 
 * Remove some kruft in btl/openib/configure.m4, therefore 
   fixing the harmless warnings noted in #665 
 * Bunches of updates to the Linux RPM spec file 

I.e., effectively the same thing that r14411 brought to the v1.2
branch.

Also effectively brought in r14432 and r14433 (some fixes on top of
the original r14411 commit to v1.2).  Still need to bring in the moral
equivalent of r14445 after this commit (fixes to installdirs).

This commit was SVN r14449.

The following SVN revision numbers were found above:
  r14411 --> open-mpi/ompi@83b31314ae
  r14432 --> open-mpi/ompi@a48f160595
  r14433 --> open-mpi/ompi@68f346d2bc
  r14445 --> open-mpi/ompi@13d366b827
2007-04-21 00:15:05 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Gleb Natapov
1f3ac2d7ae Hold pointers to free_max/free_eager lists in array indexed by priority.
This eliminates couple of ifs from fast path.

This commit was SVN r14031.
2007-03-14 14:36:03 +00:00
Gleb Natapov
2b6cbd6299 Separate frag lists for RDMA descriptors to two, one for src descriptors
and another for dst descriptors. This provide partial solution to OB1 protocol
deadlock problem. We can limit number of RDMA descriptors (by setting
btl_openib_free_list_max to something different from -1) and if we will be
lucky to hit this limit before we fail to register more memory the protocol
will not deadlock. When we had only one list for src/dst descriptors we
deadlocked when we reached max limit for the list.

This commit was SVN r13844.
2007-02-28 13:43:38 +00:00
Pavel Shamis
edeab0e912 Adding Mellanox Technologies copyright to files touched by Mellanox.
This commit was SVN r13669.
2007-02-15 18:03:20 +00:00
Jeff Squyres
c9fe68c406 Better patch from Gleb to do the per-port (endpoint) specification of
whether to use eager RDMA or not

This commit was SVN r13262.
2007-01-23 22:40:59 +00:00
Jeff Squyres
a24f3c0886 Move the "use eager RDMA" flag to the individual openib BTL modules,
not the component.  This potentially allows for a mix of HCAs that
support eager RDMA and those who do not on a port-by-port basis.

This commit was SVN r13242.
2007-01-22 18:49:32 +00:00
Jeff Squyres
d5404f21a3 Make the trunk openib btl compile again.
This commit was SVN r13110.
2007-01-13 14:22:42 +00:00
Galen Shipman
2097d174f6 heterogeneous fixes to the OpenIB BTL. This includes work by nysal, brian and
I. 

This commit was SVN r13106.
2007-01-12 23:14:45 +00:00
Galen Shipman
df099a4731 call it what it is...
we are looking at subnet_id's and we are counting active ports per subnet. 
move subnet count out of procs loop,, no need to do it there... 

This commit was SVN r13105.
2007-01-12 22:42:20 +00:00
Gleb Natapov
190e7a27cd Merge with gleb-mpool branch. All RDMA components use same mpool now (rdma).
udapl/openib/vapi/gm mpools a deprecated. rdma mpool has parameter that allows
to limit its size mpool_rdma_rcache_size_limit (default is 0 - unlimited).

This commit was SVN r12878.
2006-12-17 12:26:41 +00:00
Jeff Squyres
0ca8cb35b7 Fixes trac:366
Add ability for ini files to recognize "use_eager_rdma" flag.  Set the
default to "no" (because we should assume that HCAs cannot support the
property necessary for using RDMA for eager messages -- that the last
byte of the message is guaranteed to be written to memory last --
unless proven otherwise.  For example, iWARP cards apparently do not
provide this guarantee), and then set all Mellanox and IBM HCAs to
override the default to enable this behavior on these cards.

This commit was SVN r12851.

The following Trac tickets were found above:
  Ticket 366 --> https://svn.open-mpi.org/trac/ompi/ticket/366
2006-12-14 15:52:13 +00:00
Brian Barrett
441432950f Merge in changes from the bwb-heterogeneous temp branch (r12491 -
r12714) for supporting compilers / architectures with different
padding rules.

This commit was SVN r12749.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r12491
  r12714
2006-12-04 20:11:42 +00:00
Pavel Shamis
566667ac61 Adding progress thread support to OpenIB BTL.
Reviewed by Gleb.

This commit was SVN r12411.
2006-11-02 16:15:21 +00:00
Gleb Natapov
4c784b6403 As Andrew Friedley pointed, my previous patch may cause deadlock if
mca_btl_openib_endpoint_connect_eager_rdma() is called recursively. He also
noticed that orte_pointer_array_add() can't fail because we allocate max number
of elements at init time. So just remove error handling and locking. No locking
 - no deadlocks.

This commit was SVN r12388.
2006-11-01 15:53:33 +00:00
Gleb Natapov
aac695a51f eager_rdma_buffers update is not atomic. A buffer is added to the array and if
something is going wrong down in the code it is removed from the array. So add
mutex to prevent concurrent access to the array from different threads.

This commit was SVN r12385.
2006-11-01 07:27:32 +00:00
Andrew Friedley
48c5117476 Fix some signedness warnings on threaded builds introduced by r12369
This commit was SVN r12376.

The following SVN revision numbers were found above:
  r12369 --> open-mpi/ompi@d7375ec102
2006-10-31 17:29:25 +00:00
Gleb Natapov
d7375ec102 Fix deadlock reported by Andrew Friedley:
What's happening is that we're holding openib_btl->eager_rdma_lock when
we call mca_btl_openib_endpoint_send_eager_rdma() on
btl_openib_endpoint.c:1227.  This in turn calls
mca_btl_openib_endpoint_send() on line 1179.  Then, if the endpoint
state isn't MCA_BTL_IB_CONNECTED or MCA_BTL_IB_FAILED, we call
opal_progress(), where we eventually try to lock
openib_btl->eager_rdma_lock at btl_openib_component.c:997.

The fix removes this lock altogether. Instead we atomically set local RDMA
pointer to prevent other threads to create rdma buffer for the same endpoint.
And we increment eager_rdma_buffers_count atomically thus polling thread doesn't
need lock around it.

This commit was SVN r12369.
2006-10-31 09:54:52 +00:00
Gleb Natapov
7b1b4f95e3 Local GID table contains not what I thought it contains. It contains local HCA
GIDs (there can be more than one) and not GIDs of the HCA on the network. Entry
zero always have to be initialized so we use it, and warn user if there is more
then one port active and default subnet is configured on at least one of them.

This commit was SVN r11815.
2006-09-26 12:12:33 +00:00
Gleb Natapov
601a6ca17a Use real subnet prefix instead of sm_lid.
This commit was SVN r11749.
2006-09-22 10:27:12 +00:00
George Bosilca
a3ad4a7fc8 The visibility flags (and/or Windows friendly export) is now on for all BTLs.
This commit was SVN r11662.
2006-09-14 22:19:39 +00:00
Gleb Natapov
7999c08107 consolidate credit management and CQ polling code.
This commit was SVN r11622.
2006-09-12 09:17:59 +00:00
Gleb Natapov
9b93f48e22 fix compile warnings in previous commit.
This commit was SVN r11554.
2006-09-07 13:31:50 +00:00
Gleb Natapov
d0caffa0aa Consolidate receive buffers prepost code for HP/LP QPs.
This commit was SVN r11552.
2006-09-07 13:05:41 +00:00
Gleb Natapov
298c825592 Remove #if OMPI_MCA_BTL_OPENIB_HAVE_SRQ. Always compile SRQ.
This commit was SVN r11537.
2006-09-06 05:45:37 +00:00
Gleb Natapov
424e412391 Make eager rdma work with SRQ enabled.
This commit was SVN r11530.
2006-09-05 16:04:04 +00:00
Gleb Natapov
fe932ca7bf consolidate part of HP/LP fields.
This commit was SVN r11528.
2006-09-05 16:00:18 +00:00
Gleb Natapov
ffe7051488 fix compilation warnings.
This commit was SVN r11524.
2006-09-05 09:16:22 +00:00
Galen Shipman
e5c594c211 More updates for the async error handler for btl's
In order to provide backwards compatability the framework versions are bumped
and the handler registeration function is at the end of the btl struct.
Testing done on sm, openib, and gm.. 

This commit was SVN r11256.
2006-08-17 22:02:01 +00:00
Galen Shipman
3b49953ce2 Add error callback to the btl interface, this allows error to be delivered to
the upperlayer assynchronously although there are some issues with this.. such
as there are multiple consumers of the btl's.. who get's the

This commit was SVN r11232.
2006-08-16 20:21:38 +00:00
Jeff Squyres
474564a6b1 Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen.  Functionally [probably somewhat
lightly] tested by Galen.  We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.

Here's a summary of the changes from that branch: 

* Move MCA parameter registration to a new file (btl_openib_mca.c):
   * Properly check the retun status of registering MCA params
   * Check for valid values of MCA parameters
   * Make help strings better
   * Otherwise, the only default value of an MCA param that was
     changed was max_btls; it went from 4 to -1 (meaning: use all
     available)
 * Properly prototyped internal functions in _component.c
   * Made a bunch of functions static that didn't need to be public
   * Renamed to remove "mca_" prefix from static functions
   * Call new MCA param registration function
   * Call new INI file read/lookup/finalize functions
   * Updated a bunch of macros to be "BTL_" instead of "ORTE_"
   * Be a little more consistent with return values
   * Handle -1 for the max_btls MCA param
   * Fixed a free() that should have been an OBJ_RELEASE()
   * Some re-indenting
 * Added INI-file parsing
   * New flex file: btl_openib_ini.l
   * New default HCA params .ini file (probably to be expanded over
     time by other HCA vendors)
   * Added more show_help messages for parsing problems
   * Read in INI files and cache the values for later lookup
   * When component opens an HCA, lookup to see if any corresponding
     values were found in the INI files (ID'ed by the HCA vendor_id
     and vendor_part_id)
   * Added btl_openib_verbose MCA param that shows what the INI-file
     stuff does (e.g., shows which MTU your HCA ends up using)
   * Added btl_openib_hca_param_files as a colon-delimited list of INI
     files to check for values during startup (in order,
     left-to-right, just like the MCA base directory param).
   * MTU is currently the only value supported in this framework.
   * It is not a fatal error if we don't find params for the HCA in
     the INI file(s).  Instead, just print a warning.  New MCA param
     btl_openib_warn_no_hca_params_found can be used to disable
     printing the warning.
 * Add MTU to peer negotiation when making a connection
   * Exchange maximum MTU; select the lesser of the two

This commit was SVN r11182.
2006-08-14 19:30:37 +00:00
Galen Shipman
c9e0eda190 Initialize the completion queue to a reasonable size based on maximum number
of send/receives outstanding.

Use ibv_cq_resize if available after initial creation of completion queue if
cq_size is too small (based on number of peers). 

This commit was SVN r11053.
2006-07-30 00:58:40 +00:00
Gleb Natapov
72575d81d2 Create separate pool for control messages. It is unlimited, but the maximum number of element that are allocated from it is limited by number of connections.
This commit was SVN r11028.
2006-07-27 14:09:30 +00:00
Gleb Natapov
4b605295b3 remove unused field.
This commit was SVN r10965.
2006-07-24 06:12:16 +00:00
Gleb Natapov
383694c68d Add support to get alignemnt buffers from free_list_t. Convert openib BTL to new interface.
This commit was SVN r10899.
2006-07-20 14:39:05 +00:00
Gleb Natapov
704a5eb645 Support for LMC (lid mask count) and multiple QPs per port.
This commit was SVN r10536.
2006-06-28 07:23:08 +00:00
Jeff Squyres
df45221a3e Until a real fix for #142 is found, this workaround prohibits using
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1.  If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).

This commit was SVN r10521.
2006-06-27 10:43:03 +00:00
Jeff Squyres
1d27ca5d0a Until a real fix for #142 is found, this workaround prohibits using
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1.  If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).

This commit was SVN r10424.
2006-06-20 11:32:46 +00:00
Galen Shipman
218a438509 finished the ompi_free_list_t class nightmare..
This commit was SVN r10314.
2006-06-12 22:09:03 +00:00
Galen Shipman
0344ae4ac5 Fix to allow eager limit and max send size to be any size (within resource limitations). Instead of storing the ompi_free_list_t * in the fragment, we use the frag type enum, this tells us where the frag came from and where it should return.. This could also be done in mvapi but is not a high priority moving forward..
Review by Brian, needs to hit the trunk + 1.1 release.. 

This commit was SVN r10157.
2006-06-01 02:32:18 +00:00
Gleb Natapov
98282a3567 fix spelling. threashold -> threshold.
This commit was SVN r9577.
2006-04-08 08:13:37 +00:00
Gleb Natapov
590c992a7e fix recursive lock of openib_btl->ib_lock.
This commit was SVN r9427.
2006-03-26 15:02:43 +00:00