1
1
Граф коммитов

17 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
f2a2b2c0f9 A little more error checking; clean up the invalid MCA help message
This commit was SVN r15589.
2007-07-24 20:57:40 +00:00
Jeff Squyres
8ace07efed This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
   BTL.
1. Pasha's new implementation of asychronous HCA event handling.

Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.  

Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes).  :-(

== Fine-grain control of queue pair resources ==

Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).

Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments.  When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers.  One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.

The new design allows multiple QPs to be specified at runtime.  Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified.  The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:

{{{
-mca btl_openib_receive_queues \
     "P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}

Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma).  The above
example therefore describes 4 QPs.

The first QP is:

    P,128,16,4

Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes).  The third field indicates the number of receive buffers
to allocate to the QP (16).  The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).

The second QP is:

    S,1024,256,128,32

Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP.  The second, third and fourth fields are the same as in the
per-peer based QP.  The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32).  This provides a
"good enough" mechanism of flow control for some regular communication
patterns.

QPs MUST be specified in ascending receive buffer size order.  This
requirement may be removed prior to 1.3 release.

This commit was SVN r15474.
2007-07-18 01:15:59 +00:00
Jeff Squyres
1e18265c16 Bring over the functionality from the /tmp/jnysal-openib-wireup
branch:

 * Support btl_openib_if_include and btl_openib_if_exclude MCA
   parameters, similar to those supported by other BTLs.  Each take a
   comma-delimited lists of identifiers.  Identifiers can be HCA
   interface names (e.g., ipath0, mthca1, etc.)  or an HCA interface
   name and port numbers (e.g., ipath0:1, mthca1:2, etc.).  It is an
   error to specify both _include and _exclude.  If you specify a
   non-existant (or non-ACTIVE) HCA and/or port, you'll get a warning
   unless you disable the warning by setting the MCA parameter
   btl_openib_warn_nonexistent_if to 0.
 * Start updating to use BEGIN_C_DECLS and END_C_DECLS
 * A few other minor fixes that were picked up along the way.

This commit was SVN r15063.
2007-06-14 01:59:25 +00:00
Jeff Squyres
0ba47105ed Merge the /tmp/jms-installdirs-trunk branch into the trunk. This
finally brings in functionality that is already on the 1.2 branch, and
was developed and tested in the v1.2ofed branch (and other places).

Short version of new features:

 * Support for ibv_fork_init() 
 * Automatically fill in the openib BTL bandwidth value by 
   querying the HCA port 
 * Installdirs functionality 
 * Fixes to always use -I in the Fortran wrapper compilers (#924) 
 * Gleb's mpool updates 
 * Remove some kruft in btl/openib/configure.m4, therefore 
   fixing the harmless warnings noted in #665 
 * Bunches of updates to the Linux RPM spec file 

I.e., effectively the same thing that r14411 brought to the v1.2
branch.

Also effectively brought in r14432 and r14433 (some fixes on top of
the original r14411 commit to v1.2).  Still need to bring in the moral
equivalent of r14445 after this commit (fixes to installdirs).

This commit was SVN r14449.

The following SVN revision numbers were found above:
  r14411 --> open-mpi/ompi@83b31314ae
  r14432 --> open-mpi/ompi@a48f160595
  r14433 --> open-mpi/ompi@68f346d2bc
  r14445 --> open-mpi/ompi@13d366b827
2007-04-21 00:15:05 +00:00
Gleb Natapov
90fb58de4f When frags are allocated from mpool by free_list the frag structure is also
allocated from mpool memory (which is registered memory for RDMA transports)
This is not a problem for a small jobs, but for a big number of ranks an
amount of waisted memory is big.

This commit was SVN r13921.
2007-03-05 14:17:50 +00:00
Brian Barrett
ee753694e0 Print out the memlock limit when we can't allocate memory
This commit was SVN r13372.
2007-01-30 21:22:56 +00:00
Jeff Squyres
6b69ea664d Make a much, much better error message for a not-uncommon failure
scenario (user/sysadmin forgot to set the memlock limits high
enough).

This commit was SVN r13289.
2007-01-24 22:25:40 +00:00
Jeff Squyres
91b855c2f4 Minor fixes in the help messages
* If the text to cite where the problem occurred is "\n", prettyprint
   somethign a little nicer so that it's clear that we're talking
   about the end of line
 * Add a missing help message ("ini file:unknown field"), and display
   it a little better (i.e., show the erroneous field, not a
   misleading "end of line" marker)
 * It's "OpenIB", not "Open IB"

This commit was SVN r13241.
2007-01-22 18:45:43 +00:00
Jeff Squyres
e70ef98ea6 Update the help message to be a bit more specific and refer to the web
FAQ.

This commit was SVN r12812.
2006-12-09 15:13:03 +00:00
Gleb Natapov
7b1b4f95e3 Local GID table contains not what I thought it contains. It contains local HCA
GIDs (there can be more than one) and not GIDs of the HCA on the network. Entry
zero always have to be initialized so we use it, and warn user if there is more
then one port active and default subnet is configured on at least one of them.

This commit was SVN r11815.
2006-09-26 12:12:33 +00:00
Gleb Natapov
ac42284c16 Print more helpful message in case we can't find active port.
This commit was SVN r11706.
2006-09-19 08:56:32 +00:00
Jeff Squyres
474564a6b1 Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen.  Functionally [probably somewhat
lightly] tested by Galen.  We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.

Here's a summary of the changes from that branch: 

* Move MCA parameter registration to a new file (btl_openib_mca.c):
   * Properly check the retun status of registering MCA params
   * Check for valid values of MCA parameters
   * Make help strings better
   * Otherwise, the only default value of an MCA param that was
     changed was max_btls; it went from 4 to -1 (meaning: use all
     available)
 * Properly prototyped internal functions in _component.c
   * Made a bunch of functions static that didn't need to be public
   * Renamed to remove "mca_" prefix from static functions
   * Call new MCA param registration function
   * Call new INI file read/lookup/finalize functions
   * Updated a bunch of macros to be "BTL_" instead of "ORTE_"
   * Be a little more consistent with return values
   * Handle -1 for the max_btls MCA param
   * Fixed a free() that should have been an OBJ_RELEASE()
   * Some re-indenting
 * Added INI-file parsing
   * New flex file: btl_openib_ini.l
   * New default HCA params .ini file (probably to be expanded over
     time by other HCA vendors)
   * Added more show_help messages for parsing problems
   * Read in INI files and cache the values for later lookup
   * When component opens an HCA, lookup to see if any corresponding
     values were found in the INI files (ID'ed by the HCA vendor_id
     and vendor_part_id)
   * Added btl_openib_verbose MCA param that shows what the INI-file
     stuff does (e.g., shows which MTU your HCA ends up using)
   * Added btl_openib_hca_param_files as a colon-delimited list of INI
     files to check for values during startup (in order,
     left-to-right, just like the MCA base directory param).
   * MTU is currently the only value supported in this framework.
   * It is not a fatal error if we don't find params for the HCA in
     the INI file(s).  Instead, just print a warning.  New MCA param
     btl_openib_warn_no_hca_params_found can be used to disable
     printing the warning.
 * Add MTU to peer negotiation when making a connection
   * Exchange maximum MTU; select the lesser of the two

This commit was SVN r11182.
2006-08-14 19:30:37 +00:00
Jeff Squyres
df45221a3e Until a real fix for #142 is found, this workaround prohibits using
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1.  If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).

This commit was SVN r10521.
2006-06-27 10:43:03 +00:00
Jeff Squyres
1d27ca5d0a Until a real fix for #142 is found, this workaround prohibits using
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1.  If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).

This commit was SVN r10424.
2006-06-20 11:32:46 +00:00
Jeff Squyres
600bf4295a Update the help message to be slightly more concise and clear
This commit was SVN r10422.
2006-06-20 11:23:38 +00:00
Galen Shipman
cc54b07aa0 add better error messages for vapi retry exceeded errors.
This commit was SVN r10219.
2006-06-06 02:04:56 +00:00
Galen Shipman
9e6e7575b9 doh... add the file..
This commit was SVN r10210.
2006-06-05 21:24:42 +00:00