2007-12-21 09:02:00 +03:00
/* -*- Mode: C; c-basic-offset:4 ; -*- */
2005-07-01 01:28:35 +04:00
/*
2007-03-17 02:11:45 +03:00
* Copyright ( c ) 2004 - 2007 The Trustees of Indiana University and Indiana
2005-11-05 22:57:48 +03:00
* University Research and Technology
* Corporation . All rights reserved .
2007-12-21 09:02:00 +03:00
* Copyright ( c ) 2004 - 2007 The University of Tennessee and The University
2005-11-05 22:57:48 +03:00
* of Tennessee Research Foundation . All rights
* reserved .
2008-01-21 15:11:18 +03:00
* Copyright ( c ) 2004 - 2005 High Performance Computing Center Stuttgart ,
2005-07-01 01:28:35 +04:00
* University of Stuttgart . All rights reserved .
* Copyright ( c ) 2004 - 2005 The Regents of the University of California .
* All rights reserved .
2007-06-14 05:59:25 +04:00
* Copyright ( c ) 2006 - 2007 Cisco Systems , Inc . All rights reserved .
2007-02-15 21:03:20 +03:00
* Copyright ( c ) 2006 - 2007 Mellanox Technologies . All rights reserved .
2007-07-25 19:03:34 +04:00
* Copyright ( c ) 2006 - 2007 Los Alamos National Security , LLC . All rights
2008-01-21 15:11:18 +03:00
* reserved .
2007-09-24 14:11:52 +04:00
* Copyright ( c ) 2006 - 2007 Voltaire All rights reserved .
2005-07-01 01:28:35 +04:00
* $ COPYRIGHT $
2008-01-21 15:11:18 +03:00
*
2005-07-01 01:28:35 +04:00
* Additional copyrights may follow
2008-01-21 15:11:18 +03:00
*
2005-07-01 01:28:35 +04:00
* $ HEADER $
*/
# include "ompi_config.h"
2007-08-07 03:40:35 +04:00
2008-01-21 15:11:18 +03:00
# include <infiniband/verbs.h>
# include <errno.h>
# include <string.h> /* for strerror()*/
2007-08-07 03:40:35 +04:00
2006-02-12 04:33:29 +03:00
# include "ompi/constants.h"
2005-07-04 03:09:55 +04:00
# include "opal/event/event.h"
2006-12-19 11:34:48 +03:00
# include "opal/include/opal/align.h"
2005-07-04 05:36:20 +04:00
# include "opal/util/if.h"
2005-07-04 04:13:44 +04:00
# include "opal/util/argv.h"
2005-07-04 03:31:27 +04:00
# include "opal/util/output.h"
2006-06-06 01:23:45 +04:00
# include "opal/util/show_help.h"
2006-02-12 04:33:29 +03:00
# include "opal/sys/timer.h"
2006-09-19 17:27:05 +04:00
# include "opal/sys/atomic.h"
2007-06-14 05:59:25 +04:00
# include "opal/util/argv.h"
2006-02-12 04:33:29 +03:00
# include "opal/mca/base/mca_base_param.h"
2007-08-07 03:40:35 +04:00
2006-02-12 04:33:29 +03:00
# include "orte/mca/errmgr/errmgr.h"
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
# include "orte/util/sys_info.h"
2007-08-07 03:40:35 +04:00
# include "ompi/proc/proc.h"
# include "ompi/mca/pml/pml.h"
# include "ompi/mca/btl/btl.h"
2008-01-21 15:11:18 +03:00
# include "ompi/mca/mpool/base/base.h"
2006-12-17 15:26:41 +03:00
# include "ompi/mca/mpool/rdma/mpool_rdma.h"
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
# include "ompi/mca/btl/base/base.h"
2008-01-21 15:11:18 +03:00
# include "ompi/datatype/convertor.h"
# include "ompi/mca/mpool/mpool.h"
2007-08-07 03:40:35 +04:00
# include "ompi/runtime/ompi_module_exchange.h"
2005-07-01 01:28:35 +04:00
# include "btl_openib.h"
# include "btl_openib_frag.h"
2008-01-21 15:11:18 +03:00
# include "btl_openib_endpoint.h"
2006-03-26 12:30:50 +04:00
# include "btl_openib_eager_rdma.h"
2006-06-06 00:02:41 +04:00
# include "btl_openib_proc.h"
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
# include "btl_openib_ini.h"
# include "btl_openib_mca.h"
2007-11-28 10:18:59 +03:00
# include "btl_openib_xrc.h"
This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
2007-07-18 05:15:59 +04:00
# if OMPI_HAVE_THREADS
# include "btl_openib_async.h"
# endif
2007-08-07 03:40:35 +04:00
# include "connect/base.h"
2005-07-12 17:38:54 +04:00
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
/*
* Local functions
*/
static int btl_openib_component_open ( void ) ;
static int btl_openib_component_close ( void ) ;
2008-01-09 13:27:15 +03:00
static mca_btl_base_module_t * * btl_openib_component_init ( int * , bool , bool ) ;
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
static int btl_openib_component_progress ( void ) ;
2005-07-01 01:28:35 +04:00
mca_btl_openib_component_t mca_btl_openib_component = {
{
/* First, the mca_base_component_t struct containing meta information
about the component itself */
{
/* Indicate that we are a pml v1.0.0 component (which also implies a
specific MCA version ) */
2006-08-18 02:02:01 +04:00
MCA_BTL_BASE_VERSION_1_0_1 ,
2005-07-01 01:28:35 +04:00
2005-07-12 23:02:39 +04:00
" openib " , /* MCA component name */
Major simplifications to component versioning:
- After long discussions and ruminations on how we run components in
LAM/MPI, made the decision that, by default, all components included
in Open MPI will use the version number of their parent project
(i.e., OMPI or ORTE). They are certaint free to use a different
number, but this simplification makes the common cases easy:
- components are only released when the parent project is released
- it is easy (trivial?) to distinguish which version component goes
with with version of the parent project
- removed all autogen/configure code for templating the version .h
file in components
- made all ORTE components use ORTE_*_VERSION for version numbers
- made all OMPI components use OMPI_*_VERSION for version numbers
- removed all VERSION files from components
- configure now displays OPAL, ORTE, and OMPI version numbers
- ditto for ompi_info
- right now, faking it -- OPAL and ORTE and OMPI will always have the
same version number (i.e., they all come from the same top-level
VERSION file). But this paves the way for the Great Configure
Reorganization, where, among other things, each project will have
its own version number.
So all in all, we went from a boatload of version numbers to
[effectively] three. That's pretty good. :-)
This commit was SVN r6344.
2005-07-05 00:12:36 +04:00
OMPI_MAJOR_VERSION , /* MCA component major version */
OMPI_MINOR_VERSION , /* MCA component minor version */
OMPI_RELEASE_VERSION , /* MCA component release version */
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
btl_openib_component_open , /* component open */
btl_openib_component_close /* component close */
2005-07-01 01:28:35 +04:00
} ,
/* Next the MCA v1.0.0 component meta data */
{
2007-03-17 02:11:45 +03:00
/* The component is not checkpoint ready */
MCA_BASE_METADATA_PARAM_NONE
2005-07-01 01:28:35 +04:00
} ,
2008-01-21 15:11:18 +03:00
btl_openib_component_init ,
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
btl_openib_component_progress ,
2005-07-01 01:28:35 +04:00
}
} ;
/*
* Called by MCA framework to open the component , registers
* component parameters .
*/
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
int btl_openib_component_open ( void )
2005-07-01 01:28:35 +04:00
{
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
int ret ;
2006-06-09 22:02:45 +04:00
2005-07-01 01:28:35 +04:00
/* initialize state */
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
mca_btl_openib_component . ib_num_btls = 0 ;
mca_btl_openib_component . openib_btls = NULL ;
2007-12-21 09:02:00 +03:00
OBJ_CONSTRUCT ( & mca_btl_openib_component . hcas , opal_pointer_array_t ) ;
2007-08-22 13:31:12 +04:00
mca_btl_openib_component . hcas_count = 0 ;
2007-08-20 16:28:25 +04:00
2008-01-21 15:11:18 +03:00
/* initialize objects */
2005-07-03 20:22:16 +04:00
OBJ_CONSTRUCT ( & mca_btl_openib_component . ib_procs , opal_list_t ) ;
2005-07-01 01:28:35 +04:00
/* register IB component parameters */
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
ret = btl_openib_register_mca_params ( ) ;
This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
2007-07-18 05:15:59 +04:00
mca_btl_openib_component . max_send_size =
mca_btl_openib_module . super . btl_max_send_size ;
mca_btl_openib_component . eager_limit =
mca_btl_openib_module . super . btl_eager_limit ;
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
2007-07-18 05:15:59 +04:00
srand48 ( getpid ( ) * time ( NULL ) ) ;
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
return ret ;
2005-07-01 01:28:35 +04:00
}
/*
* component cleanup - sanity checking of queue lengths
*/
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
static int btl_openib_component_close ( void )
2005-07-01 01:28:35 +04:00
{
2007-10-04 21:36:12 +04:00
/* Close down the connect pseudo component */
2007-10-04 22:03:56 +04:00
if ( NULL ! = ompi_btl_openib_connect . bcf_finalize ) {
ompi_btl_openib_connect . bcf_finalize ( ) ;
}
2007-10-04 21:36:12 +04:00
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
ompi_btl_openib_ini_finalize ( ) ;
2005-07-01 01:28:35 +04:00
return OMPI_SUCCESS ;
}
2005-10-01 02:58:09 +04:00
/*
2005-10-02 22:58:57 +04:00
* Register OPENIB port information . The MCA framework
2005-10-01 02:58:09 +04:00
* will make this available to all peers .
*/
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
static int btl_openib_modex_send ( void )
2005-10-01 02:58:09 +04:00
{
2008-01-15 02:22:03 +03:00
int rc , i ;
char * message , * offset ;
uint32_t size , size_save ;
size_t msg_size ;
/* The message is packed into 2 parts:
* 1. a uint32_t indicating the number of ports in the message
* 2. for each port :
* a . the port data
* b . a uint32_t indicating a string length
* c . the string cpc list for that port , length specified by 2 b .
*/
msg_size = sizeof ( uint32_t ) + mca_btl_openib_component . ib_num_btls * ( sizeof ( uint32_t ) + sizeof ( mca_btl_openib_port_info_t ) ) ;
for ( i = 0 ; i < mca_btl_openib_component . ib_num_btls ; i + + ) {
msg_size + = strlen ( mca_btl_openib_component . openib_btls [ i ] - > port_info . cpclist ) ;
}
if ( 0 = = msg_size ) {
return 0 ;
}
2005-10-01 02:58:09 +04:00
2008-01-15 02:22:03 +03:00
message = malloc ( msg_size ) ;
if ( NULL = = message ) {
BTL_ERROR ( ( " Failed malloc: %s:%d \n " , __FILE__ , __LINE__ ) ) ;
return OMPI_ERR_OUT_OF_RESOURCE ;
2008-01-21 15:11:18 +03:00
}
2008-01-15 02:22:03 +03:00
/* Pack the number of ports */
size = mca_btl_openib_component . ib_num_btls ;
2007-01-13 02:14:45 +03:00
# if !defined(WORDS_BIGENDIAN) && OMPI_ENABLE_HETEROGENEOUS_SUPPORT
2008-01-15 02:22:03 +03:00
size = htonl ( size ) ;
2007-01-13 02:14:45 +03:00
# endif
2008-01-15 02:22:03 +03:00
memcpy ( message , & size , sizeof ( size ) ) ;
offset = message + sizeof ( size ) ;
/* Pack each of the ports */
for ( i = 0 ; i < mca_btl_openib_component . ib_num_btls ; i + + ) {
/* Pack the port struct */
memcpy ( offset , & mca_btl_openib_component . openib_btls [ i ] - > port_info , sizeof ( mca_btl_openib_port_info_t ) ) ;
# if !defined(WORDS_BIGENDIAN) && OMPI_ENABLE_HETEROGENEOUS_SUPPORT
MCA_BTL_OPENIB_PORT_INFO_HTON ( * ( mca_btl_openib_port_info_t * ) offset ) ;
# endif
offset + = sizeof ( mca_btl_openib_port_info_t ) ;
/* Pack the strlen of the cpclist */
size = size_save =
strlen ( mca_btl_openib_component . openib_btls [ i ] - > port_info . cpclist ) ;
# if !defined(WORDS_BIGENDIAN) && OMPI_ENABLE_HETEROGENEOUS_SUPPORT
size = htonl ( size ) ;
# endif
memcpy ( offset , & size , sizeof ( size ) ) ;
offset + = sizeof ( size ) ;
/* Pack the string */
2008-01-21 15:11:18 +03:00
memcpy ( offset ,
mca_btl_openib_component . openib_btls [ i ] - > port_info . cpclist ,
2008-01-15 02:22:03 +03:00
size_save ) ;
offset + = size_save ;
2005-10-01 02:58:09 +04:00
}
2008-01-15 02:22:03 +03:00
2008-01-21 15:11:18 +03:00
rc = ompi_modex_send ( & mca_btl_openib_component . super . btl_version ,
2008-01-15 02:22:03 +03:00
message , msg_size ) ;
free ( message ) ;
2005-10-01 02:58:09 +04:00
return rc ;
}
2005-11-10 23:15:02 +03:00
/*
This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
2007-07-18 05:15:59 +04:00
* Active Message Callback function on control message .
2005-11-10 23:15:02 +03:00
*/
2007-11-28 10:13:34 +03:00
static void btl_openib_control ( mca_btl_base_module_t * btl ,
mca_btl_base_tag_t tag , mca_btl_base_descriptor_t * des ,
void * cbdata )
2005-11-10 23:15:02 +03:00
{
2007-11-28 10:11:14 +03:00
/* don't return credits used for control messages */
2007-12-09 17:05:13 +03:00
mca_btl_openib_module_t * obtl = ( mca_btl_openib_module_t * ) btl ;
2007-11-28 10:13:34 +03:00
mca_btl_openib_endpoint_t * ep = to_com_frag ( des ) - > endpoint ;
2007-11-28 10:11:14 +03:00
mca_btl_openib_control_header_t * ctl_hdr =
to_base_frag ( des ) - > segment . seg_addr . pval ;
2006-03-26 12:30:50 +04:00
mca_btl_openib_eager_rdma_header_t * rdma_hdr ;
2007-12-09 17:05:13 +03:00
mca_btl_openib_header_coalesced_t * clsc_hdr =
( mca_btl_openib_header_coalesced_t * ) ( ctl_hdr + 1 ) ;
2008-01-15 08:32:53 +03:00
mca_btl_active_message_callback_t * reg ;
2007-12-09 17:05:13 +03:00
size_t len = des - > des_dst - > seg_len - sizeof ( * ctl_hdr ) ;
2008-01-21 15:11:18 +03:00
2006-03-26 12:30:50 +04:00
switch ( ctl_hdr - > type ) {
2006-09-05 20:02:09 +04:00
case MCA_BTL_OPENIB_CONTROL_CREDITS :
2007-11-28 10:13:34 +03:00
assert ( 0 ) ; /* Credit message is handled elsewhere */
2007-01-13 02:14:45 +03:00
break ;
2006-03-26 12:30:50 +04:00
case MCA_BTL_OPENIB_CONTROL_RDMA :
rdma_hdr = ( mca_btl_openib_eager_rdma_header_t * ) ctl_hdr ;
2008-01-21 15:11:18 +03:00
BTL_VERBOSE ( ( " prior to NTOH received rkey %lu, rdma_start.lval %llu, pval %p, ival %u \n " ,
rdma_hdr - > rkey ,
2007-01-13 02:14:45 +03:00
( unsigned long ) rdma_hdr - > rdma_start . lval ,
rdma_hdr - > rdma_start . pval ,
2007-03-05 17:17:50 +03:00
rdma_hdr - > rdma_start . ival
2007-01-13 02:14:45 +03:00
) ) ;
2008-01-21 15:11:18 +03:00
2007-11-28 10:13:34 +03:00
if ( ep - > nbo ) {
2007-11-28 10:12:44 +03:00
BTL_OPENIB_EAGER_RDMA_CONTROL_HEADER_NTOH ( * rdma_hdr ) ;
}
2008-01-21 15:11:18 +03:00
2007-11-28 10:12:44 +03:00
BTL_VERBOSE ( ( " received rkey %lu, rdma_start.lval %llu, pval %p, "
" ival %u \n " , rdma_hdr - > rkey ,
( unsigned long ) rdma_hdr - > rdma_start . lval ,
rdma_hdr - > rdma_start . pval , rdma_hdr - > rdma_start . ival ) ) ;
2008-01-21 15:11:18 +03:00
2007-11-28 10:13:34 +03:00
if ( ep - > eager_rdma_remote . base . pval ) {
2008-01-10 00:54:11 +03:00
BTL_ERROR ( ( " Got RDMA connect twice! " ) ) ;
return ;
2006-03-26 12:30:50 +04:00
}
2007-11-28 10:13:34 +03:00
ep - > eager_rdma_remote . rkey = rdma_hdr - > rkey ;
ep - > eager_rdma_remote . base . lval = rdma_hdr - > rdma_start . lval ;
ep - > eager_rdma_remote . tokens = mca_btl_openib_component . eager_rdma_num - 1 ;
2006-03-26 12:30:50 +04:00
break ;
2007-12-09 17:05:13 +03:00
case MCA_BTL_OPENIB_CONTROL_COALESCED :
while ( len > 0 ) {
2007-12-09 17:10:25 +03:00
size_t skip ;
2007-12-09 17:05:13 +03:00
mca_btl_base_descriptor_t tmp_des ;
mca_btl_base_segment_t tmp_seg ;
assert ( len > = sizeof ( * clsc_hdr ) ) ;
2007-12-09 17:10:25 +03:00
if ( ep - > nbo )
BTL_OPENIB_HEADER_COALESCED_NTOH ( * clsc_hdr ) ;
skip = ( sizeof ( * clsc_hdr ) + clsc_hdr - > alloc_size ) ;
2007-12-09 17:05:13 +03:00
tmp_des . des_dst = & tmp_seg ;
tmp_des . des_dst_cnt = 1 ;
tmp_seg . seg_addr . pval = clsc_hdr + 1 ;
tmp_seg . seg_len = clsc_hdr - > size ;
/* call registered callback */
2008-01-15 08:32:53 +03:00
reg = mca_btl_base_active_message_trigger + clsc_hdr - > tag ;
reg - > cbfunc ( & obtl - > super , clsc_hdr - > tag , & tmp_des , reg - > cbdata ) ;
2007-12-09 17:05:13 +03:00
len - = skip ;
clsc_hdr = ( mca_btl_openib_header_coalesced_t * )
( ( ( unsigned char * ) clsc_hdr ) + skip ) ;
}
break ;
2006-03-26 12:30:50 +04:00
default :
2007-01-13 02:14:45 +03:00
BTL_ERROR ( ( " Unknown message type received by BTL " ) ) ;
2006-03-26 12:30:50 +04:00
break ;
}
2005-11-10 23:15:02 +03:00
}
2006-12-17 15:26:41 +03:00
static int openib_reg_mr ( void * reg_data , void * base , size_t size ,
mca_mpool_base_registration_t * reg )
{
mca_btl_openib_hca_t * hca = ( mca_btl_openib_hca_t * ) reg_data ;
mca_btl_openib_reg_t * openib_reg = ( mca_btl_openib_reg_t * ) reg ;
openib_reg - > mr = ibv_reg_mr ( hca - > ib_pd , base , size , IBV_ACCESS_LOCAL_WRITE |
IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_REMOTE_READ ) ;
if ( NULL = = openib_reg - > mr )
return OMPI_ERR_OUT_OF_RESOURCE ;
return OMPI_SUCCESS ;
}
static int openib_dereg_mr ( void * reg_data , mca_mpool_base_registration_t * reg )
{
mca_btl_openib_reg_t * openib_reg = ( mca_btl_openib_reg_t * ) reg ;
if ( openib_reg - > mr ! = NULL ) {
if ( ibv_dereg_mr ( openib_reg - > mr ) ) {
2007-10-15 21:53:02 +04:00
BTL_ERROR ( ( " %s: error unpinning openib memory errno says %s \n " ,
__func__ , strerror ( errno ) ) ) ;
2006-12-17 15:26:41 +03:00
return OMPI_ERROR ;
}
}
openib_reg - > mr = NULL ;
return OMPI_SUCCESS ;
}
2007-06-13 16:47:38 +04:00
static inline int param_register_int ( const char * param_name , int default_value )
{
int param_value = default_value ;
int id = mca_base_param_register_int ( " btl " , " openib " , param_name , NULL ,
default_value ) ;
mca_base_param_lookup_int ( id , & param_value ) ;
return param_value ;
}
2006-12-17 15:26:41 +03:00
2007-11-28 10:14:34 +03:00
# if OMPI_HAVE_THREADS
static int start_async_event_thread ( void )
{
/* Set the fatal counter to zero */
mca_btl_openib_component . fatal_counter = 0 ;
/* Create pipe for communication with async event thread */
if ( pipe ( mca_btl_openib_component . async_pipe ) ) {
BTL_ERROR ( ( " Failed to create pipe for communication with "
" async event thread " ) ) ;
return OMPI_ERROR ;
}
/* Starting async event thread for the component */
if ( pthread_create ( & mca_btl_openib_component . async_thread , NULL ,
( void * ( * ) ( void * ) ) btl_openib_async_thread , NULL ) ) {
BTL_ERROR ( ( " Failed to create async event thread " ) ) ;
return OMPI_ERROR ;
}
return OMPI_SUCCESS ;
}
# endif
2006-06-28 11:23:08 +04:00
static int init_one_port ( opal_list_t * btl_list , mca_btl_openib_hca_t * hca ,
2007-04-22 14:22:12 +04:00
uint8_t port_num , uint16_t pkey_index ,
struct ibv_port_attr * ib_port_attr )
2006-06-28 11:23:08 +04:00
{
2008-01-28 13:38:08 +03:00
uint16_t lid , i , lmc , lmc_step ;
2006-06-28 11:23:08 +04:00
mca_btl_openib_module_t * openib_btl ;
mca_btl_base_selected_module_t * ib_selected ;
2006-09-22 14:27:12 +04:00
union ibv_gid gid ;
2007-01-13 01:42:20 +03:00
uint64_t subnet_id ;
2007-04-21 04:15:05 +04:00
2006-09-26 16:12:33 +04:00
ibv_query_gid ( hca - > ib_dev_context , port_num , 0 , & gid ) ;
2007-01-13 01:42:20 +03:00
subnet_id = ntoh64 ( gid . global . subnet_prefix ) ;
BTL_VERBOSE ( ( " my subnet_id is %016x \n " , subnet_id ) ) ;
2008-01-21 15:11:18 +03:00
2006-09-26 16:12:33 +04:00
if ( mca_btl_openib_component . ib_num_btls > 0 & &
2007-01-13 01:42:20 +03:00
IB_DEFAULT_GID_PREFIX = = subnet_id & &
2006-09-26 16:12:33 +04:00
mca_btl_openib_component . warn_default_gid_prefix ) {
opal_show_help ( " help-mpi-btl-openib.txt " , " default subnet prefix " ,
true , orte_system_info . nodename ) ;
}
2006-06-28 11:23:08 +04:00
lmc = ( 1 < < ib_port_attr - > lmc ) ;
2008-01-28 13:38:08 +03:00
lmc_step = 1 ;
2006-06-28 11:23:08 +04:00
2008-01-21 15:11:18 +03:00
if ( 0 ! = mca_btl_openib_component . max_lmc & &
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
mca_btl_openib_component . max_lmc < lmc ) {
2006-06-28 11:23:08 +04:00
lmc = mca_btl_openib_component . max_lmc ;
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
}
2006-06-28 11:23:08 +04:00
2008-01-28 19:10:18 +03:00
# if OMPI_HAVE_THREADS
2008-01-28 13:38:08 +03:00
/* APM support */
if ( lmc > 1 ) {
if ( - 1 = = mca_btl_openib_component . apm ) {
lmc_step = lmc ;
} else if ( 0 = = lmc % ( mca_btl_openib_component . apm + 1 ) ) {
lmc_step = mca_btl_openib_component . apm + 1 ;
} else {
opal_show_help ( " help-mpi-btl-openib.txt " , " apm with wrong lmc " , true ,
mca_btl_openib_component . apm , lmc ) ;
return OMPI_ERROR ;
}
} else {
if ( mca_btl_openib_component . apm ) {
/* Disable apm and report warning */
mca_btl_openib_component . apm = 0 ;
opal_show_help ( " help-mpi-btl-openib.txt " , " apm without lmc " , true ) ;
}
}
2008-01-28 19:10:18 +03:00
# endif
2008-01-28 13:38:08 +03:00
2006-06-28 11:23:08 +04:00
for ( lid = ib_port_attr - > lid ;
2008-01-28 13:38:08 +03:00
lid < ib_port_attr - > lid + lmc ; lid + = lmc_step ) {
2006-06-28 11:23:08 +04:00
for ( i = 0 ; i < mca_btl_openib_component . btls_per_lid ; i + + ) {
2007-06-13 16:47:38 +04:00
char param [ 40 ] ;
2008-01-15 02:22:03 +03:00
int rc ;
2006-06-28 11:23:08 +04:00
openib_btl = malloc ( sizeof ( mca_btl_openib_module_t ) ) ;
if ( NULL = = openib_btl ) {
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
BTL_ERROR ( ( " Failed malloc: %s:%d \n " , __FILE__ , __LINE__ ) ) ;
return OMPI_ERR_OUT_OF_RESOURCE ;
2006-06-28 11:23:08 +04:00
}
memcpy ( openib_btl , & mca_btl_openib_module ,
sizeof ( mca_btl_openib_module ) ) ;
memcpy ( & openib_btl - > ib_port_attr , ib_port_attr ,
sizeof ( struct ibv_port_attr ) ) ;
ib_selected = OBJ_NEW ( mca_btl_base_selected_module_t ) ;
ib_selected - > btl_module = ( mca_btl_base_module_t * ) openib_btl ;
openib_btl - > hca = hca ;
openib_btl - > port_num = ( uint8_t ) port_num ;
2007-04-22 14:22:12 +04:00
openib_btl - > pkey_index = pkey_index ;
2006-06-28 11:23:08 +04:00
openib_btl - > lid = lid ;
2008-01-28 13:38:08 +03:00
openib_btl - > apm_lmc_max = lmc_step ;
2006-06-28 11:23:08 +04:00
openib_btl - > src_path_bits = lid - ib_port_attr - > lid ;
2007-01-22 21:49:32 +03:00
/* store the subnet for multi-nic support */
2007-01-13 01:42:20 +03:00
openib_btl - > port_info . subnet_id = subnet_id ;
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
openib_btl - > port_info . mtu = hca - > mtu ;
2007-11-28 10:18:59 +03:00
# if HAVE_XRC
/* This code is protected with ifdef because we don't want to send
* extra bytes during OOB */
if ( MCA_BTL_XRC_ENABLED ) {
openib_btl - > port_info . lid = lid ;
}
# endif
2008-01-15 02:22:03 +03:00
rc = ompi_btl_openib_connect_base_query ( & openib_btl - > port_info . cpclist , hca ) ;
if ( OMPI_SUCCESS ! = rc ) {
continue ;
}
2008-01-15 08:32:53 +03:00
mca_btl_base_active_message_trigger [ MCA_BTL_TAG_IB ] . cbfunc = btl_openib_control ;
mca_btl_base_active_message_trigger [ MCA_BTL_TAG_IB ] . cbdata = NULL ;
2007-04-21 04:15:05 +04:00
2007-06-13 16:47:38 +04:00
/* Check bandwidth configured for this HCA */
sprintf ( param , " bandwidth_%s " , ibv_get_device_name ( hca - > ib_dev ) ) ;
openib_btl - > super . btl_bandwidth =
param_register_int ( param , openib_btl - > super . btl_bandwidth ) ;
/* Check bandwidth configured for this HCA/port */
sprintf ( param , " bandwidth_%s:%d " , ibv_get_device_name ( hca - > ib_dev ) ,
port_num ) ;
openib_btl - > super . btl_bandwidth =
param_register_int ( param , openib_btl - > super . btl_bandwidth ) ;
/* Check bandwidth configured for this HCA/port/LID */
sprintf ( param , " bandwidth_%s:%d:%d " ,
ibv_get_device_name ( hca - > ib_dev ) , port_num , lid ) ;
openib_btl - > super . btl_bandwidth =
param_register_int ( param , openib_btl - > super . btl_bandwidth ) ;
/* Check latency configured for this HCA */
sprintf ( param , " latency_%s " , ibv_get_device_name ( hca - > ib_dev ) ) ;
openib_btl - > super . btl_latency =
param_register_int ( param , openib_btl - > super . btl_latency ) ;
/* Check latency configured for this HCA/port */
sprintf ( param , " latency_%s:%d " , ibv_get_device_name ( hca - > ib_dev ) ,
port_num ) ;
openib_btl - > super . btl_latency =
param_register_int ( param , openib_btl - > super . btl_latency ) ;
/* Check latency configured for this HCA/port/LID */
sprintf ( param , " latency_%s:%d:%d " , ibv_get_device_name ( hca - > ib_dev ) ,
port_num , lid ) ;
openib_btl - > super . btl_latency =
param_register_int ( param , openib_btl - > super . btl_latency ) ;
2007-04-21 04:15:05 +04:00
/* Auto-detect the port bandwidth */
if ( 0 = = openib_btl - > super . btl_bandwidth ) {
/* To calculate the bandwidth available on this port,
we have to look up the values corresponding to
port - > active_speed and port - > active_width . These
are enums corresponding to the IB spec . Overall
forumula is 80 % of the reported speed ( to get the
true link speed ) times the number of links . */
switch ( ib_port_attr - > active_speed ) {
2008-01-21 15:11:18 +03:00
case 1 :
2007-04-21 04:15:05 +04:00
/* 2.5Gbps * 0.8, in megabits */
openib_btl - > super . btl_bandwidth = 2000 ;
break ;
2008-01-21 15:11:18 +03:00
case 2 :
2007-04-21 04:15:05 +04:00
/* 5.0Gbps * 0.8, in megabits */
openib_btl - > super . btl_bandwidth = 4000 ;
break ;
2008-01-21 15:11:18 +03:00
case 4 :
2007-04-21 04:15:05 +04:00
/* 10.0Gbps * 0.8, in megabits */
openib_btl - > super . btl_bandwidth = 8000 ;
break ;
2008-01-21 15:11:18 +03:00
default :
2007-04-21 04:15:05 +04:00
/* Who knows? */
return OMPI_ERR_VALUE_OUT_OF_BOUNDS ;
}
switch ( ib_port_attr - > active_width ) {
case 1 :
/* 1x */
/* unity */
break ;
case 2 :
/* 4x */
openib_btl - > super . btl_bandwidth * = 4 ;
break ;
case 4 :
/* 8x */
openib_btl - > super . btl_bandwidth * = 8 ;
break ;
case 8 :
/* 12x */
openib_btl - > super . btl_bandwidth * = 12 ;
break ;
default :
/* Who knows? */
return OMPI_ERR_VALUE_OUT_OF_BOUNDS ;
}
}
2006-06-28 11:23:08 +04:00
opal_list_append ( btl_list , ( opal_list_item_t * ) ib_selected ) ;
hca - > btls + + ;
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
+ + mca_btl_openib_component . ib_num_btls ;
if ( - 1 ! = mca_btl_openib_component . ib_max_btls & &
mca_btl_openib_component . ib_num_btls > =
mca_btl_openib_component . ib_max_btls ) {
2006-09-25 15:18:20 +04:00
return OMPI_ERR_VALUE_OUT_OF_BOUNDS ;
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
}
2006-06-28 11:23:08 +04:00
}
}
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
return OMPI_SUCCESS ;
2006-06-28 11:23:08 +04:00
}
2007-12-23 15:29:34 +03:00
static void hca_construct ( mca_btl_openib_hca_t * hca )
{
2008-01-09 13:26:21 +03:00
int i ;
2007-12-23 15:29:34 +03:00
hca - > ib_dev = NULL ;
hca - > ib_dev_context = NULL ;
2008-01-09 13:26:21 +03:00
hca - > mpool = NULL ;
# if OMPI_ENABLE_PROGRESS_THREADS == 1
hca - > ib_channel = NULL ;
# endif
2007-12-23 15:29:34 +03:00
hca - > btls = 0 ;
hca - > ib_cq [ BTL_OPENIB_HP_CQ ] = NULL ;
hca - > ib_cq [ BTL_OPENIB_LP_CQ ] = NULL ;
hca - > cq_size [ BTL_OPENIB_HP_CQ ] = 0 ;
hca - > cq_size [ BTL_OPENIB_LP_CQ ] = 0 ;
hca - > non_eager_rdma_endpoints = 0 ;
hca - > hp_cq_polls = mca_btl_openib_component . cq_poll_ratio ;
hca - > eager_rdma_polls = mca_btl_openib_component . eager_rdma_poll_ratio ;
hca - > pollme = true ;
hca - > eager_rdma_buffers_count = 0 ;
hca - > eager_rdma_buffers = NULL ;
2008-01-09 13:26:21 +03:00
# if HAVE_XRC
hca - > xrc_fd = - 1 ;
# endif
hca - > qps = ( mca_btl_openib_hca_qp_t * ) calloc ( mca_btl_openib_component . num_qps ,
sizeof ( mca_btl_openib_hca_qp_t ) ) ;
2008-01-21 15:11:18 +03:00
OBJ_CONSTRUCT ( & hca - > hca_lock , opal_mutex_t ) ;
2008-01-09 13:26:21 +03:00
for ( i = 0 ; i < mca_btl_openib_component . num_qps ; i + + ) {
OBJ_CONSTRUCT ( & hca - > qps [ i ] . send_free , ompi_free_list_t ) ;
OBJ_CONSTRUCT ( & hca - > qps [ i ] . recv_free , ompi_free_list_t ) ;
}
OBJ_CONSTRUCT ( & hca - > send_free_control , ompi_free_list_t ) ;
2007-12-23 15:29:34 +03:00
}
static void hca_destruct ( mca_btl_openib_hca_t * hca )
{
2008-01-09 13:26:21 +03:00
int i ;
2007-12-23 15:29:34 +03:00
if ( hca - > eager_rdma_buffers ) {
int i ;
for ( i = 0 ; i < hca - > eager_rdma_buffers_count ; i + + )
if ( hca - > eager_rdma_buffers [ i ] )
OBJ_RELEASE ( hca - > eager_rdma_buffers [ i ] ) ;
free ( hca - > eager_rdma_buffers ) ;
}
2008-01-21 15:11:18 +03:00
OBJ_DESTRUCT ( & hca - > hca_lock ) ;
2008-01-09 13:26:21 +03:00
for ( i = 0 ; i < mca_btl_openib_component . num_qps ; i + + ) {
OBJ_DESTRUCT ( & hca - > qps [ i ] . send_free ) ;
OBJ_DESTRUCT ( & hca - > qps [ i ] . recv_free ) ;
}
OBJ_DESTRUCT ( & hca - > send_free_control ) ;
if ( hca - > qps )
free ( hca - > qps ) ;
2007-12-23 15:29:34 +03:00
}
OBJ_CLASS_INSTANCE ( mca_btl_openib_hca_t , opal_object_t , hca_construct ,
hca_destruct ) ;
2008-01-09 13:26:21 +03:00
static int prepare_hca_for_use ( mca_btl_openib_hca_t * hca )
{
mca_btl_openib_frag_init_data_t * init_data ;
int qp , length ;
# if OMPI_HAVE_THREADS
if ( mca_btl_openib_component . use_async_event_thread ) {
if ( 0 = = mca_btl_openib_component . async_thread ) {
/* async thread is not yet started, so start it here */
if ( start_async_event_thread ( ) ! = OMPI_SUCCESS )
return OMPI_ERROR ;
}
hca - > got_fatal_event = false ;
if ( write ( mca_btl_openib_component . async_pipe [ 1 ] ,
& hca - > ib_dev_context - > async_fd , sizeof ( int ) ) < 0 ) {
BTL_ERROR ( ( " Failed to write to pipe [%d] " , errno ) ) ;
return OMPI_ERROR ;
2008-01-21 15:11:18 +03:00
}
2008-01-09 13:26:21 +03:00
}
# if OMPI_ENABLE_PROGRESS_THREADS == 1
/* Prepare data for thread, but not starting it */
OBJ_CONSTRUCT ( & hca - > thread , opal_thread_t ) ;
hca - > thread . t_run = mca_btl_openib_progress_thread ;
hca - > thread . t_arg = hca ;
hca - > progress = false ;
# endif
# endif
hca - > endpoints = OBJ_NEW ( opal_pointer_array_t ) ;
opal_pointer_array_init ( hca - > endpoints , 10 , INT_MAX , 10 ) ;
opal_pointer_array_add ( & mca_btl_openib_component . hcas , hca ) ;
if ( mca_btl_openib_component . max_eager_rdma > 0 & &
mca_btl_openib_component . use_eager_rdma & &
hca - > use_eager_rdma ) {
hca - > eager_rdma_buffers =
calloc ( mca_btl_openib_component . max_eager_rdma * hca - > btls ,
sizeof ( mca_btl_openib_endpoint_t * ) ) ;
if ( NULL = = hca - > eager_rdma_buffers ) {
BTL_ERROR ( ( " Memory allocation fails \n " ) ) ;
return OMPI_ERR_OUT_OF_RESOURCE ;
}
}
init_data = malloc ( sizeof ( mca_btl_openib_frag_init_data_t ) ) ;
length = sizeof ( mca_btl_openib_header_t ) +
2008-01-21 15:11:18 +03:00
sizeof ( mca_btl_openib_footer_t ) +
2008-01-09 13:26:21 +03:00
sizeof ( mca_btl_openib_eager_rdma_header_t ) ;
2008-01-21 15:11:18 +03:00
2008-01-09 13:26:21 +03:00
init_data - > order = MCA_BTL_NO_ORDER ;
init_data - > list = & hca - > send_free_control ;
2008-01-21 15:11:18 +03:00
2008-01-09 13:26:21 +03:00
if ( OMPI_SUCCESS ! = ompi_free_list_init_ex_new (
& hca - > send_free_control ,
sizeof ( mca_btl_openib_send_control_frag_t ) , CACHE_LINE_SIZE ,
OBJ_CLASS ( mca_btl_openib_send_control_frag_t ) , length ,
mca_btl_openib_component . buffer_alignment ,
mca_btl_openib_component . ib_free_list_num , - 1 ,
mca_btl_openib_component . ib_free_list_inc ,
hca - > mpool , mca_btl_openib_frag_init ,
2008-01-21 15:11:18 +03:00
init_data ) ) {
2008-01-09 13:26:21 +03:00
return OMPI_ERROR ;
}
2008-01-21 15:11:18 +03:00
/* setup all the qps */
for ( qp = 0 ; qp < mca_btl_openib_component . num_qps ; qp + + ) {
2008-01-09 13:26:21 +03:00
init_data = malloc ( sizeof ( mca_btl_openib_frag_init_data_t ) ) ;
2008-01-21 15:11:18 +03:00
/* Initialize pool of send fragments */
2008-01-09 13:26:21 +03:00
length = sizeof ( mca_btl_openib_header_t ) +
sizeof ( mca_btl_openib_header_coalesced_t ) +
sizeof ( mca_btl_openib_control_header_t ) +
sizeof ( mca_btl_openib_footer_t ) +
mca_btl_openib_component . qp_infos [ qp ] . size ;
2008-01-21 15:11:18 +03:00
2008-01-09 13:26:21 +03:00
init_data - > order = qp ;
init_data - > list = & hca - > qps [ qp ] . send_free ;
if ( OMPI_SUCCESS ! = ompi_free_list_init_ex_new ( init_data - > list ,
sizeof ( mca_btl_openib_send_frag_t ) , CACHE_LINE_SIZE ,
OBJ_CLASS ( mca_btl_openib_send_frag_t ) , length ,
mca_btl_openib_component . buffer_alignment ,
mca_btl_openib_component . ib_free_list_num ,
mca_btl_openib_component . ib_free_list_max ,
mca_btl_openib_component . ib_free_list_inc ,
hca - > mpool , mca_btl_openib_frag_init ,
2008-01-21 15:11:18 +03:00
init_data ) ) {
2008-01-09 13:26:21 +03:00
return OMPI_ERROR ;
}
2008-01-21 15:11:18 +03:00
2008-01-09 13:26:21 +03:00
init_data = malloc ( sizeof ( mca_btl_openib_frag_init_data_t ) ) ;
length = sizeof ( mca_btl_openib_header_t ) +
sizeof ( mca_btl_openib_header_coalesced_t ) +
sizeof ( mca_btl_openib_control_header_t ) +
sizeof ( mca_btl_openib_footer_t ) +
mca_btl_openib_component . qp_infos [ qp ] . size ;
2008-01-21 15:11:18 +03:00
2008-01-09 13:26:21 +03:00
init_data - > order = qp ;
init_data - > list = & hca - > qps [ qp ] . recv_free ;
2008-01-21 15:11:18 +03:00
2008-01-09 13:26:21 +03:00
if ( OMPI_SUCCESS ! = ompi_free_list_init_ex_new ( init_data - > list ,
sizeof ( mca_btl_openib_recv_frag_t ) , CACHE_LINE_SIZE ,
OBJ_CLASS ( mca_btl_openib_recv_frag_t ) ,
length , mca_btl_openib_component . buffer_alignment ,
mca_btl_openib_component . ib_free_list_num ,
mca_btl_openib_component . ib_free_list_max ,
mca_btl_openib_component . ib_free_list_inc ,
hca - > mpool , mca_btl_openib_frag_init ,
2008-01-21 15:11:18 +03:00
init_data ) ) {
2008-01-09 13:26:21 +03:00
return OMPI_ERROR ;
}
}
mca_btl_openib_component . hcas_count + + ;
return OMPI_SUCCESS ;
}
2008-01-09 13:27:15 +03:00
static int
get_port_list ( mca_btl_openib_hca_t * hca , int * allowed_ports )
{
int i , j , k , num_ports = 0 ;
const char * dev_name ;
char * name ;
dev_name = ibv_get_device_name ( hca - > ib_dev ) ;
name = ( char * ) malloc ( strlen ( dev_name ) + 4 ) ;
if ( NULL = = name ) {
return 0 ;
}
/* Assume that all ports are allowed. num_ports will be adjusted
below to reflect whether this is true or not . */
for ( i = 1 ; i < = hca - > ib_dev_attr . phys_port_cnt ; + + i ) {
allowed_ports [ num_ports + + ] = i ;
}
num_ports = 0 ;
if ( NULL ! = mca_btl_openib_component . if_include_list ) {
/* If only the HCA name is given (eg. mthca0,mthca1) use all
ports */
i = 0 ;
while ( mca_btl_openib_component . if_include_list [ i ] ) {
2008-01-21 15:11:18 +03:00
if ( 0 = = strcmp ( dev_name ,
2008-01-09 13:27:15 +03:00
mca_btl_openib_component . if_include_list [ i ] ) ) {
num_ports = hca - > ib_dev_attr . phys_port_cnt ;
goto done ;
}
+ + i ;
}
/* Include only requested ports on the HCA */
for ( i = 1 ; i < = hca - > ib_dev_attr . phys_port_cnt ; + + i ) {
sprintf ( name , " %s:%d " , dev_name , i ) ;
2008-01-21 15:11:18 +03:00
for ( j = 0 ;
2008-01-09 13:27:15 +03:00
NULL ! = mca_btl_openib_component . if_include_list [ j ] ; + + j ) {
2008-01-21 15:11:18 +03:00
if ( 0 = = strcmp ( name ,
2008-01-09 13:27:15 +03:00
mca_btl_openib_component . if_include_list [ j ] ) ) {
allowed_ports [ num_ports + + ] = i ;
break ;
}
}
}
} else if ( NULL ! = mca_btl_openib_component . if_exclude_list ) {
/* If only the HCA name is given (eg. mthca0,mthca1) exclude
all ports */
i = 0 ;
while ( mca_btl_openib_component . if_exclude_list [ i ] ) {
2008-01-21 15:11:18 +03:00
if ( 0 = = strcmp ( dev_name ,
2008-01-09 13:27:15 +03:00
mca_btl_openib_component . if_exclude_list [ i ] ) ) {
num_ports = 0 ;
goto done ;
}
+ + i ;
}
/* Exclude the specified ports on this HCA */
for ( i = 1 ; i < = hca - > ib_dev_attr . phys_port_cnt ; + + i ) {
sprintf ( name , " %s:%d " , dev_name , i ) ;
2008-01-21 15:11:18 +03:00
for ( j = 0 ;
2008-01-09 13:27:15 +03:00
NULL ! = mca_btl_openib_component . if_exclude_list [ j ] ; + + j ) {
2008-01-21 15:11:18 +03:00
if ( 0 = = strcmp ( name ,
2008-01-09 13:27:15 +03:00
mca_btl_openib_component . if_exclude_list [ j ] ) ) {
/* If found, set a sentinel value */
j = - 1 ;
break ;
}
}
/* If we didn't find it, it's ok to include in the list */
if ( - 1 ! = j ) {
allowed_ports [ num_ports + + ] = i ;
}
}
} else {
num_ports = hca - > ib_dev_attr . phys_port_cnt ;
}
done :
/* Remove the following from the error-checking if_list:
- bare device name
- device name suffixed with port number */
if ( NULL ! = mca_btl_openib_component . if_list ) {
for ( i = 0 ; NULL ! = mca_btl_openib_component . if_list [ i ] ; + + i ) {
/* Look for raw device name */
if ( 0 = = strcmp ( mca_btl_openib_component . if_list [ i ] , dev_name ) ) {
j = opal_argv_count ( mca_btl_openib_component . if_list ) ;
opal_argv_delete ( & j , & ( mca_btl_openib_component . if_list ) ,
i , 1 ) ;
- - i ;
}
}
for ( i = 1 ; i < = hca - > ib_dev_attr . phys_port_cnt ; + + i ) {
sprintf ( name , " %s:%d " , dev_name , i ) ;
for ( j = 0 ; NULL ! = mca_btl_openib_component . if_list [ j ] ; + + j ) {
if ( 0 = = strcmp ( mca_btl_openib_component . if_list [ j ] , name ) ) {
k = opal_argv_count ( mca_btl_openib_component . if_list ) ;
opal_argv_delete ( & k , & ( mca_btl_openib_component . if_list ) ,
j , 1 ) ;
- - j ;
break ;
}
}
}
}
free ( name ) ;
return num_ports ;
}
static void merge_values ( ompi_btl_openib_ini_values_t * target ,
ompi_btl_openib_ini_values_t * src )
{
if ( ! target - > mtu_set & & src - > mtu_set ) {
target - > mtu = src - > mtu ;
target - > mtu_set = true ;
}
if ( ! target - > use_eager_rdma_set & & src - > use_eager_rdma_set ) {
target - > use_eager_rdma = src - > use_eager_rdma ;
target - > use_eager_rdma_set = true ;
}
}
static bool inline is_credit_message ( const mca_btl_openib_recv_frag_t * frag )
{
mca_btl_openib_control_header_t * chdr =
to_base_frag ( frag ) - > segment . seg_addr . pval ;
return ( MCA_BTL_TAG_BTL = = frag - > hdr - > tag ) & &
( MCA_BTL_OPENIB_CONTROL_CREDITS = = chdr - > type ) ;
}
2006-06-28 11:23:08 +04:00
static int init_one_hca ( opal_list_t * btl_list , struct ibv_device * ib_dev )
{
struct mca_mpool_base_resources_t mpool_resources ;
mca_btl_openib_hca_t * hca ;
2007-06-14 05:59:25 +04:00
uint8_t i , k = 0 ;
int ret = - 1 , port_cnt ;
2006-08-31 00:21:47 +04:00
ompi_btl_openib_ini_values_t values , default_values ;
2007-06-14 17:59:28 +04:00
int * allowed_ports = NULL ;
2008-01-21 15:11:18 +03:00
2007-12-23 15:29:34 +03:00
hca = OBJ_NEW ( mca_btl_openib_hca_t ) ;
2006-06-28 11:23:08 +04:00
if ( NULL = = hca ) {
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
BTL_ERROR ( ( " Failed malloc: %s:%d \n " , __FILE__ , __LINE__ ) ) ;
return OMPI_ERR_OUT_OF_RESOURCE ;
2006-06-28 11:23:08 +04:00
}
2008-01-21 15:11:18 +03:00
2006-06-28 11:23:08 +04:00
hca - > ib_dev = ib_dev ;
hca - > ib_dev_context = ibv_open_device ( ib_dev ) ;
2007-12-23 15:29:34 +03:00
2008-01-21 15:11:18 +03:00
if ( NULL = = hca - > ib_dev_context ) {
2006-06-28 11:23:08 +04:00
BTL_ERROR ( ( " error obtaining device context for %s errno says %s \n " ,
2008-01-21 15:11:18 +03:00
ibv_get_device_name ( ib_dev ) , strerror ( errno ) ) ) ;
2008-01-09 13:26:21 +03:00
goto error ;
2008-01-21 15:11:18 +03:00
}
if ( ibv_query_device ( hca - > ib_dev_context , & hca - > ib_dev_attr ) ) {
2006-06-28 11:23:08 +04:00
BTL_ERROR ( ( " error obtaining device attributes for %s errno says %s \n " ,
2008-01-21 15:11:18 +03:00
ibv_get_device_name ( ib_dev ) , strerror ( errno ) ) ) ;
2008-01-09 13:26:21 +03:00
goto error ;
2006-06-28 11:23:08 +04:00
}
2007-06-14 05:59:25 +04:00
/* If mca_btl_if_include/exclude were specified, get usable ports */
2007-12-23 15:29:34 +03:00
allowed_ports = ( int * ) malloc ( hca - > ib_dev_attr . phys_port_cnt * sizeof ( int ) ) ;
2007-06-14 05:59:25 +04:00
port_cnt = get_port_list ( hca , allowed_ports ) ;
if ( 0 = = port_cnt ) {
ret = OMPI_SUCCESS ;
2008-01-09 13:26:21 +03:00
goto error ;
2007-06-14 05:59:25 +04:00
}
2007-11-28 10:18:59 +03:00
# if HAVE_XRC
/* if user configured to run with XRC qp and the device don't support it -
* we should ignore this hca . Maybe we have other one that have XRC support
*/
if ( ! ( hca - > ib_dev_attr . device_cap_flags & IBV_DEVICE_XRC ) & &
mca_btl_openib_component . num_xrc_qps > 0 ) {
opal_show_help ( " help-mpi-btl-openib.txt " ,
" XRC on device without XRC support " , true ,
mca_btl_openib_component . num_xrc_qps ,
ibv_get_device_name ( ib_dev ) ,
orte_system_info . nodename ) ;
ret = OMPI_SUCCESS ;
2008-01-09 13:26:21 +03:00
goto error ;
2007-11-28 10:18:59 +03:00
}
# endif
2006-08-31 00:21:47 +04:00
/* Load in vendor/part-specific HCA parameters. Note that even if
we don ' t find values for this vendor / part , " values " will be set
indicating that it does not have good values */
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
ret = ompi_btl_openib_ini_query ( hca - > ib_dev_attr . vendor_id ,
hca - > ib_dev_attr . vendor_part_id ,
& values ) ;
2006-08-31 00:21:47 +04:00
if ( OMPI_SUCCESS ! = ret & & OMPI_ERR_NOT_FOUND ! = ret ) {
/* If we get a serious error, propagate it upwards */
2008-01-09 13:26:21 +03:00
goto error ;
2006-08-31 00:21:47 +04:00
}
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
if ( OMPI_ERR_NOT_FOUND = = ret ) {
/* If we didn't find a matching HCA in the INI files, output a
warning that we ' re using default values ( unless overridden
that we don ' t want to see these warnings ) */
if ( mca_btl_openib_component . warn_no_hca_params_found ) {
opal_show_help ( " help-mpi-btl-openib.txt " ,
" no hca params found " , true ,
orte_system_info . nodename ,
hca - > ib_dev_attr . vendor_id ,
hca - > ib_dev_attr . vendor_part_id ) ;
}
2006-08-31 00:21:47 +04:00
}
/* Note that even if we don't find default values, "values" will
be set indicating that it does not have good values */
ret = ompi_btl_openib_ini_query ( 0 , 0 , & default_values ) ;
if ( OMPI_SUCCESS ! = ret & & OMPI_ERR_NOT_FOUND ! = ret ) {
/* If we get a serious error, propagate it upwards */
2008-01-09 13:26:21 +03:00
goto error ;
2006-08-31 00:21:47 +04:00
}
/* If we did find values for this HCA (or in the defaults
section ) , handle them */
merge_values ( & values , & default_values ) ;
if ( values . mtu_set ) {
switch ( values . mtu ) {
case 256 :
hca - > mtu = IBV_MTU_256 ;
break ;
case 512 :
hca - > mtu = IBV_MTU_512 ;
break ;
case 1024 :
hca - > mtu = IBV_MTU_1024 ;
break ;
case 2048 :
hca - > mtu = IBV_MTU_2048 ;
break ;
case 4096 :
hca - > mtu = IBV_MTU_4096 ;
break ;
default :
BTL_ERROR ( ( " invalid MTU value specified in INI file (%d); ignored \n " , values . mtu ) ) ;
hca - > mtu = mca_btl_openib_component . ib_mtu ;
break ;
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
}
2006-08-31 00:21:47 +04:00
} else {
hca - > mtu = mca_btl_openib_component . ib_mtu ;
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
}
2006-12-14 18:52:13 +03:00
/* If "use eager rdma" was set, then enable it on this HCA */
if ( values . use_eager_rdma_set ) {
hca - > use_eager_rdma = values . use_eager_rdma ;
}
2006-06-28 11:23:08 +04:00
hca - > ib_pd = ibv_alloc_pd ( hca - > ib_dev_context ) ;
if ( NULL = = hca - > ib_pd ) {
BTL_ERROR ( ( " error allocating pd for %s errno says %s \n " ,
ibv_get_device_name ( ib_dev ) , strerror ( errno ) ) ) ;
2008-01-09 13:26:21 +03:00
goto error ;
2006-06-28 11:23:08 +04:00
}
2005-11-10 23:15:02 +03:00
2007-11-28 10:18:59 +03:00
if ( MCA_BTL_XRC_ENABLED ) {
if ( OMPI_SUCCESS ! = mca_btl_openib_open_xrc_domain ( hca ) ) {
BTL_ERROR ( ( " XRC Internal error. Failed to open xrc domain " ) ) ;
2008-01-09 13:26:21 +03:00
goto error ;
2007-11-28 10:18:59 +03:00
}
}
2006-12-17 15:26:41 +03:00
mpool_resources . reg_data = ( void * ) hca ;
mpool_resources . sizeof_reg = sizeof ( mca_btl_openib_reg_t ) ;
mpool_resources . register_mem = openib_reg_mr ;
mpool_resources . deregister_mem = openib_dereg_mr ;
2006-06-28 11:23:08 +04:00
hca - > mpool =
mca_mpool_base_module_create ( mca_btl_openib_component . ib_mpool_name ,
hca , & mpool_resources ) ;
if ( NULL = = hca - > mpool ) {
BTL_ERROR ( ( " error creating IB memory pool for %s errno says %s \n " ,
ibv_get_device_name ( ib_dev ) , strerror ( errno ) ) ) ;
2008-01-09 13:26:21 +03:00
goto error ;
2006-06-28 11:23:08 +04:00
}
2008-01-21 15:11:18 +03:00
2006-11-02 19:15:21 +03:00
# if OMPI_ENABLE_PROGRESS_THREADS == 1
hca - > ib_channel = ibv_create_comp_channel ( hca - > ib_dev_context ) ;
if ( NULL = = hca - > ib_channel ) {
BTL_ERROR ( ( " error creating channel for %s errno says %s \n " ,
ibv_get_device_name ( hca - > ib_dev ) ,
strerror ( errno ) ) ) ;
2008-01-09 13:26:21 +03:00
goto error ;
2006-11-02 19:15:21 +03:00
}
# endif
2008-01-21 15:11:18 +03:00
ret = OMPI_SUCCESS ;
2006-11-02 19:15:21 +03:00
2007-06-14 05:59:25 +04:00
/* Note ports are 1 based (i >= 1) */
for ( k = 0 ; k < port_cnt ; k + + ) {
2006-06-28 11:23:08 +04:00
struct ibv_port_attr ib_port_attr ;
2007-06-14 05:59:25 +04:00
i = allowed_ports [ k ] ;
2006-06-28 11:23:08 +04:00
if ( ibv_query_port ( hca - > ib_dev_context , i , & ib_port_attr ) ) {
BTL_ERROR ( ( " error getting port attributes for device %s "
" port number %d errno says %s " ,
ibv_get_device_name ( ib_dev ) , i , strerror ( errno ) ) ) ;
2008-01-21 15:11:18 +03:00
break ;
2006-06-28 11:23:08 +04:00
}
if ( IBV_PORT_ACTIVE = = ib_port_attr . state ) {
2007-04-22 14:22:12 +04:00
if ( 0 = = mca_btl_openib_component . ib_pkey_val ) {
ret = init_one_port ( btl_list , hca , i , mca_btl_openib_component . ib_pkey_ix ,
& ib_port_attr ) ;
}
else {
uint16_t pkey , j ;
for ( j = 0 ; j < hca - > ib_dev_attr . max_pkeys ; j + + ) {
ibv_query_pkey ( hca - > ib_dev_context , i , j , & pkey ) ;
pkey = ntohs ( pkey ) ;
if ( pkey = = mca_btl_openib_component . ib_pkey_val ) {
ret = init_one_port ( btl_list , hca , i , j , & ib_port_attr ) ;
break ;
}
}
}
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
if ( OMPI_SUCCESS ! = ret ) {
2008-01-21 15:11:18 +03:00
/* Out of bounds error indicates that we hit max btl number
2006-09-25 15:18:20 +04:00
* don ' t propagate the error to the caller */
if ( OMPI_ERR_VALUE_OUT_OF_BOUNDS = = ret )
ret = OMPI_SUCCESS ;
2006-06-28 11:23:08 +04:00
break ;
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
}
2006-06-28 11:23:08 +04:00
}
}
2007-08-20 16:28:25 +04:00
/* If we made a BTL, we're done. Otherwise, fall through and
destroy everything */
if ( hca - > btls > 0 ) {
2008-01-09 13:26:21 +03:00
ret = prepare_hca_for_use ( hca ) ;
if ( OMPI_SUCCESS = = ret )
return OMPI_SUCCESS ;
If you have an HCA with no active ports, we still create an mpool.
This mpool will have no btl module owner there was no btl created for
the HCA with no ports, but it will still be tracked in the mpool
framework (i.e., it's available).
If MPI_ALLOC_MEM is called by the app, one of two things will happen:
1. if there's an HCA on the host with some active ports, the openib
btl component will still be in the process space, and therefore
the "mpool with no btl" (MWNB) module will still be able to call
the reg/dereg functions, and all will be fine. However, if
MPI_FREE_MEM is never invoked to free the memory, bad things will
happen during MPI_FINALIZE. The pml is finalized, which finalizes
all the btls. The btls finalize all their mpools and all is fine.
But later we close down the mpool framework which then finalizes
any left over mpool modules, such as MWNB. However, the openib
BTL module functions that the MWNB was registered with are no
longer in the process space, and it segv's while trying deregister
the memory.
2. if there are *no* HCA's on the host with active ports, then the
openib btl will have been unloaded, and when the MWNM tries to
register the memory, the functions it tries to call (in the openib
btl) are no longer there, and we segv.
This commit was SVN r15735.
2007-08-02 00:53:34 +04:00
}
2006-06-28 11:23:08 +04:00
2008-01-09 13:26:21 +03:00
error :
# if defined(OMPI_HAVE_THREADS) && OMPI_ENABLE_PROGRESS_THREADS == 1
if ( hca - > ib_channel )
ibv_destroy_comp_channel ( hca - > ib_channel ) ;
2006-11-06 15:34:56 +03:00
# endif
2008-01-09 13:26:21 +03:00
if ( hca - > mpool )
mca_mpool_base_module_destroy ( hca - > mpool ) ;
2007-11-28 10:18:59 +03:00
if ( MCA_BTL_XRC_ENABLED ) {
2008-01-09 13:26:21 +03:00
if ( OMPI_SUCCESS ! = mca_btl_openib_close_xrc_domain ( hca ) ) {
2007-11-28 10:18:59 +03:00
BTL_ERROR ( ( " XRC Internal error. Failed to close xrc domain " ) ) ;
}
}
2008-01-09 13:26:21 +03:00
if ( hca - > ib_pd )
ibv_dealloc_pd ( hca - > ib_pd ) ;
if ( hca - > ib_dev_context )
ibv_close_device ( hca - > ib_dev_context ) ;
2007-12-23 15:29:34 +03:00
OBJ_RELEASE ( hca ) ;
2006-06-28 11:23:08 +04:00
return ret ;
}
2006-12-17 15:26:41 +03:00
2007-10-23 16:57:45 +04:00
static int finish_btl_init ( mca_btl_openib_module_t * openib_btl )
{
2008-01-09 13:26:21 +03:00
int qp ;
2007-10-23 16:57:45 +04:00
openib_btl - > num_peers = 0 ;
/* Initialize module state */
OBJ_CONSTRUCT ( & openib_btl - > ib_lock , opal_mutex_t ) ;
2008-01-21 15:11:18 +03:00
2007-10-23 16:57:45 +04:00
/* setup the qp structure */
openib_btl - > qps = ( mca_btl_openib_module_qp_t * )
2008-01-09 13:26:21 +03:00
calloc ( mca_btl_openib_component . num_qps ,
sizeof ( mca_btl_openib_module_qp_t ) ) ;
2008-01-21 15:11:18 +03:00
/* setup all the qps */
for ( qp = 0 ; qp < mca_btl_openib_component . num_qps ; qp + + ) {
if ( ! BTL_OPENIB_QP_TYPE_PP ( qp ) ) {
2007-11-28 10:12:44 +03:00
OBJ_CONSTRUCT ( & openib_btl - > qps [ qp ] . u . srq_qp . pending_frags [ 0 ] ,
opal_list_t ) ;
OBJ_CONSTRUCT ( & openib_btl - > qps [ qp ] . u . srq_qp . pending_frags [ 1 ] ,
opal_list_t ) ;
2008-01-21 15:11:18 +03:00
openib_btl - > qps [ qp ] . u . srq_qp . sd_credits =
2007-10-23 16:57:45 +04:00
mca_btl_openib_component . qp_infos [ qp ] . u . srq_qp . sd_max ;
}
}
2008-01-21 15:11:18 +03:00
/* initialize the memory pool using the hca */
2008-01-09 13:26:21 +03:00
openib_btl - > super . btl_mpool = openib_btl - > hca - > mpool ;
2008-01-21 15:11:18 +03:00
2007-12-23 15:29:34 +03:00
openib_btl - > eager_rdma_channels = 0 ;
2007-10-23 16:57:45 +04:00
openib_btl - > eager_rdma_frag_size = OPAL_ALIGN (
sizeof ( mca_btl_openib_header_t ) +
2007-12-09 17:05:13 +03:00
sizeof ( mca_btl_openib_header_coalesced_t ) +
sizeof ( mca_btl_openib_control_header_t ) +
2007-10-23 16:57:45 +04:00
sizeof ( mca_btl_openib_footer_t ) +
openib_btl - > super . btl_eager_limit ,
mca_btl_openib_component . buffer_alignment , size_t ) ;
return OMPI_SUCCESS ;
}
2008-02-03 18:16:24 +03:00
static struct ibv_device * * ibv_get_device_list_compat ( int * num_devs )
{
struct ibv_device * * ib_devs ;
# ifdef HAVE_IBV_GET_DEVICE_LIST
ib_devs = ibv_get_device_list ( num_devs ) ;
# else
struct dlist * dev_list ;
struct ibv_device * ib_dev ;
* num_devs = 0 ;
/* Determine the number of hca's available on the host */
dev_list = ibv_get_devices ( ) ;
if ( NULL = = dev_list )
return NULL ;
dlist_start ( dev_list ) ;
dlist_for_each_data ( dev_list , ib_dev , struct ibv_device )
( * num_devs ) + + ;
/* Allocate space for the ib devices */
ib_devs = ( struct ibv_device * * ) malloc ( * num_devs * sizeof ( struct ibv_dev * ) ) ;
if ( NULL = = ib_devs ) {
* num_devs = 0 ;
BTL_ERROR ( ( " Failed malloc: %s:%d \n " , __FILE__ , __LINE__ ) ) ;
return NULL ;
}
dlist_start ( dev_list ) ;
dlist_for_each_data ( dev_list , ib_dev , struct ibv_device )
* ( + + ib_devs ) = ib_dev ;
# endif
2008-02-04 17:05:01 +03:00
return ib_devs ;
2008-02-03 18:16:24 +03:00
}
static void ibv_free_device_list_compat ( struct ibv_device * * ib_devs )
{
# ifdef HAVE_IBV_GET_DEVICE_LIST
ibv_free_device_list ( ib_devs ) ;
# else
free ( ib_devs ) ;
# endif
}
2005-07-01 01:28:35 +04:00
/*
* IB component initialization :
* ( 1 ) read interface list from kernel and compare against component parameters
* then create a BTL instance for selected interfaces
* ( 2 ) setup IB listen socket for incoming connection attempts
* ( 3 ) register BTL parameters with the MCA
*/
2008-01-21 15:11:18 +03:00
static mca_btl_base_module_t * *
btl_openib_component_init ( int * num_btl_modules ,
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
bool enable_progress_threads ,
bool enable_mpi_threads )
2005-07-01 01:28:35 +04:00
{
2008-01-21 15:11:18 +03:00
struct ibv_device * * ib_devs ;
2005-07-01 01:28:35 +04:00
mca_btl_base_module_t * * btls ;
2008-01-09 13:26:21 +03:00
int i , ret , num_devs , length ;
2008-01-21 15:11:18 +03:00
opal_list_t btl_list ;
mca_btl_openib_module_t * openib_btl ;
mca_btl_base_selected_module_t * ib_selected ;
opal_list_item_t * item ;
2006-01-13 02:42:44 +03:00
unsigned short seedv [ 3 ] ;
2008-01-09 13:26:21 +03:00
mca_btl_openib_frag_init_data_t * init_data ;
2005-07-20 01:04:22 +04:00
2005-07-01 01:28:35 +04:00
/* initialization */
* num_btl_modules = 0 ;
2008-01-21 15:11:18 +03:00
num_devs = 0 ;
2005-07-01 01:28:35 +04:00
2006-01-13 02:42:44 +03:00
seedv [ 0 ] = orte_process_info . my_name - > vpid ;
seedv [ 1 ] = opal_sys_timer_get_cycles ( ) ;
seedv [ 2 ] = opal_sys_timer_get_cycles ( ) ;
seed48 ( seedv ) ;
2006-01-17 19:23:35 +03:00
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
/* Read in INI files with HCA-specific parameters */
if ( OMPI_SUCCESS ! = ( ret = ompi_btl_openib_ini_init ( ) ) ) {
2007-06-14 05:59:25 +04:00
goto no_btls ;
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
}
2007-11-28 10:18:59 +03:00
if ( MCA_BTL_XRC_ENABLED ) {
OBJ_CONSTRUCT ( & mca_btl_openib_component . ib_addr_table ,
opal_hash_table_t ) ;
}
2008-01-09 13:26:21 +03:00
OBJ_CONSTRUCT ( & mca_btl_openib_component . send_free_coalesced , ompi_free_list_t ) ;
OBJ_CONSTRUCT ( & mca_btl_openib_component . send_user_free , ompi_free_list_t ) ;
OBJ_CONSTRUCT ( & mca_btl_openib_component . recv_user_free , ompi_free_list_t ) ;
init_data = malloc ( sizeof ( mca_btl_openib_frag_init_data_t ) ) ;
2008-01-21 15:11:18 +03:00
2008-01-09 13:26:21 +03:00
init_data - > order = mca_btl_openib_component . rdma_qp ;
init_data - > list = & mca_btl_openib_component . send_user_free ;
2008-01-21 15:11:18 +03:00
2008-01-09 13:26:21 +03:00
if ( OMPI_SUCCESS ! = ompi_free_list_init_ex_new (
& mca_btl_openib_component . send_user_free ,
sizeof ( mca_btl_openib_put_frag_t ) , 2 ,
OBJ_CLASS ( mca_btl_openib_put_frag_t ) ,
0 , 0 ,
mca_btl_openib_component . ib_free_list_num ,
mca_btl_openib_component . ib_free_list_max ,
mca_btl_openib_component . ib_free_list_inc ,
2008-01-21 15:11:18 +03:00
NULL , mca_btl_openib_frag_init , init_data ) ) {
2008-01-09 13:26:21 +03:00
goto no_btls ;
}
2008-01-21 15:11:18 +03:00
2008-01-09 13:26:21 +03:00
init_data = malloc ( sizeof ( mca_btl_openib_frag_init_data_t ) ) ;
init_data - > order = mca_btl_openib_component . rdma_qp ;
init_data - > list = & mca_btl_openib_component . recv_user_free ;
2008-01-21 15:11:18 +03:00
2008-01-09 13:26:21 +03:00
if ( OMPI_SUCCESS ! = ompi_free_list_init_ex_new (
& mca_btl_openib_component . recv_user_free ,
sizeof ( mca_btl_openib_get_frag_t ) , 2 ,
OBJ_CLASS ( mca_btl_openib_get_frag_t ) ,
0 , 0 ,
mca_btl_openib_component . ib_free_list_num ,
mca_btl_openib_component . ib_free_list_max ,
mca_btl_openib_component . ib_free_list_inc ,
2008-01-21 15:11:18 +03:00
NULL , mca_btl_openib_frag_init , init_data ) ) {
2008-01-09 13:26:21 +03:00
goto no_btls ;
}
init_data = malloc ( sizeof ( mca_btl_openib_frag_init_data_t ) ) ;
length = sizeof ( mca_btl_openib_coalesced_frag_t ) ;
init_data - > list = & mca_btl_openib_component . send_free_coalesced ;
if ( OMPI_SUCCESS ! = ompi_free_list_init_ex (
& mca_btl_openib_component . send_free_coalesced ,
length , 2 , OBJ_CLASS ( mca_btl_openib_coalesced_frag_t ) ,
mca_btl_openib_component . ib_free_list_num ,
mca_btl_openib_component . ib_free_list_max ,
mca_btl_openib_component . ib_free_list_inc ,
NULL , mca_btl_openib_frag_init , init_data ) ) {
goto no_btls ;
}
2007-04-21 04:15:05 +04:00
/* If we want fork support, try to enable it */
# ifdef HAVE_IBV_FORK_INIT
if ( 0 ! = mca_btl_openib_component . want_fork_support ) {
if ( 0 ! = ibv_fork_init ( ) ) {
/* If the want_fork_support MCA parameter is >0, then the
user was specifically asking for fork support and we
couldn ' t provide it . So print an error and deactivate
this BTL . */
if ( mca_btl_openib_component . want_fork_support > 0 ) {
opal_show_help ( " help-mpi-btl-openib.txt " ,
" ibv_fork_init fail " , true ,
orte_system_info . nodename ) ;
2007-06-14 05:59:25 +04:00
goto no_btls ;
2008-01-21 15:11:18 +03:00
}
2007-04-21 04:15:05 +04:00
}
}
# endif
2007-06-14 05:59:25 +04:00
/* Parse the include and exclude lists, checking for errors */
mca_btl_openib_component . if_include_list =
2008-01-21 15:11:18 +03:00
mca_btl_openib_component . if_exclude_list =
2007-06-14 05:59:25 +04:00
mca_btl_openib_component . if_list = NULL ;
if ( NULL ! = mca_btl_openib_component . if_include & &
NULL ! = mca_btl_openib_component . if_exclude ) {
opal_show_help ( " help-mpi-btl-openib.txt " ,
" specified include and exclude " , true ,
mca_btl_openib_component . if_include ,
mca_btl_openib_component . if_exclude , NULL ) ;
goto no_btls ;
} else if ( NULL ! = mca_btl_openib_component . if_include ) {
2008-01-21 15:11:18 +03:00
mca_btl_openib_component . if_include_list =
2007-06-14 05:59:25 +04:00
opal_argv_split ( mca_btl_openib_component . if_include , ' , ' ) ;
2008-01-21 15:11:18 +03:00
mca_btl_openib_component . if_list =
2007-06-14 05:59:25 +04:00
opal_argv_copy ( mca_btl_openib_component . if_include_list ) ;
} else if ( NULL ! = mca_btl_openib_component . if_exclude ) {
2008-01-21 15:11:18 +03:00
mca_btl_openib_component . if_exclude_list =
2007-06-14 05:59:25 +04:00
opal_argv_split ( mca_btl_openib_component . if_exclude , ' , ' ) ;
2008-01-21 15:11:18 +03:00
mca_btl_openib_component . if_list =
2007-06-14 05:59:25 +04:00
opal_argv_copy ( mca_btl_openib_component . if_exclude_list ) ;
}
2008-02-03 18:16:24 +03:00
ib_devs = ibv_get_device_list_compat ( & num_devs ) ;
2005-07-01 01:28:35 +04:00
2008-02-03 18:16:24 +03:00
if ( 0 = = num_devs | | NULL = = ib_devs ) {
2005-09-30 18:58:59 +04:00
mca_btl_base_error_no_nics ( " OpenIB " , " HCA " ) ;
2008-02-03 18:16:24 +03:00
goto no_btls ;
2005-07-01 01:28:35 +04:00
}
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
/* We must loop through all the hca id's, get their handles and
for each hca we query the number of ports on the hca and set up
2008-01-21 15:11:18 +03:00
a distinct btl module for each hca port */
2005-07-01 01:28:35 +04:00
2008-01-21 15:11:18 +03:00
OBJ_CONSTRUCT ( & btl_list , opal_list_t ) ;
2005-07-04 02:45:48 +04:00
OBJ_CONSTRUCT ( & mca_btl_openib_component . ib_lock , opal_mutex_t ) ;
This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
2007-07-18 05:15:59 +04:00
# if OMPI_HAVE_THREADS
2007-11-28 10:14:34 +03:00
mca_btl_openib_component . async_thread = 0 ;
This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
2007-07-18 05:15:59 +04:00
# endif
2007-11-28 10:14:34 +03:00
for ( i = 0 ; i < num_devs & & ( - 1 = = mca_btl_openib_component . ib_max_btls | |
mca_btl_openib_component . ib_num_btls <
mca_btl_openib_component . ib_max_btls ) ; i + + ) {
if ( OMPI_SUCCESS ! = ( ret = init_one_hca ( & btl_list , ib_devs [ i ] ) ) )
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
break ;
2006-09-19 12:56:32 +04:00
}
if ( ret ! = OMPI_SUCCESS ) {
opal_show_help ( " help-mpi-btl-openib.txt " ,
" error in hca init " , true , orte_system_info . nodename ) ;
}
2007-06-14 05:59:25 +04:00
/* If we got back from checking all the HCAs and find that there
are still items in the component . if_list , that means that they
didn ' t exist . Show an appropriate warning if the warning was
not disabled . */
if ( 0 ! = opal_argv_count ( mca_btl_openib_component . if_list ) & &
mca_btl_openib_component . warn_nonexistent_if ) {
char * str = opal_argv_join ( mca_btl_openib_component . if_list , ' , ' ) ;
opal_show_help ( " help-mpi-btl-openib.txt " , " nonexistent port " ,
true , orte_system_info . nodename ,
2008-01-21 15:11:18 +03:00
( ( NULL ! = mca_btl_openib_component . if_include ) ?
2007-06-14 05:59:25 +04:00
" in " : " ex " ) , str ) ;
free ( str ) ;
}
2008-01-21 15:11:18 +03:00
2006-09-19 12:56:32 +04:00
if ( 0 = = mca_btl_openib_component . ib_num_btls ) {
opal_show_help ( " help-mpi-btl-openib.txt " ,
" no active ports found " , true , orte_system_info . nodename ) ;
return NULL ;
}
2005-07-01 01:28:35 +04:00
/* Allocate space for btl modules */
2006-06-28 11:23:08 +04:00
mca_btl_openib_component . openib_btls =
2007-05-09 01:47:21 +04:00
malloc ( sizeof ( mca_btl_openib_module_t * ) *
2006-06-28 11:23:08 +04:00
mca_btl_openib_component . ib_num_btls ) ;
2005-07-12 17:38:54 +04:00
if ( NULL = = mca_btl_openib_component . openib_btls ) {
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
BTL_ERROR ( ( " Failed malloc: %s:%d \n " , __FILE__ , __LINE__ ) ) ;
2005-07-01 01:28:35 +04:00
return NULL ;
}
2008-02-03 18:16:24 +03:00
btls = ( struct mca_btl_base_module_t * * )
malloc ( mca_btl_openib_component . ib_num_btls *
sizeof ( struct mca_btl_base_module_t * ) ) ;
2005-07-01 01:28:35 +04:00
if ( NULL = = btls ) {
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
BTL_ERROR ( ( " Failed malloc: %s:%d \n " , __FILE__ , __LINE__ ) ) ;
2005-07-01 01:28:35 +04:00
return NULL ;
}
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
/* Copy the btl module structs into a contiguous array and fully
initialize them */
2005-07-01 01:28:35 +04:00
for ( i = 0 ; i < mca_btl_openib_component . ib_num_btls ; i + + ) {
2008-01-21 15:11:18 +03:00
item = opal_list_remove_first ( & btl_list ) ;
ib_selected = ( mca_btl_base_selected_module_t * ) item ;
2007-10-23 16:57:45 +04:00
mca_btl_openib_component . openib_btls [ i ] =
2008-01-21 15:11:18 +03:00
( mca_btl_openib_module_t * ) ib_selected - > btl_module ;
OBJ_RELEASE ( ib_selected ) ;
2007-05-09 01:47:21 +04:00
openib_btl = mca_btl_openib_component . openib_btls [ i ] ;
2007-10-23 15:10:52 +04:00
btls [ i ] = & openib_btl - > super ;
2007-10-23 16:57:45 +04:00
if ( finish_btl_init ( openib_btl ) ! = OMPI_SUCCESS )
return NULL ;
2007-10-23 15:10:52 +04:00
}
2005-07-01 01:28:35 +04:00
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
btl_openib_modex_send ( ) ;
2005-07-01 01:28:35 +04:00
* num_btl_modules = mca_btl_openib_component . ib_num_btls ;
2008-02-03 18:16:24 +03:00
ibv_free_device_list_compat ( ib_devs ) ;
2007-06-14 05:59:25 +04:00
if ( NULL ! = mca_btl_openib_component . if_include_list ) {
opal_argv_free ( mca_btl_openib_component . if_include_list ) ;
mca_btl_openib_component . if_include_list = NULL ;
}
if ( NULL ! = mca_btl_openib_component . if_exclude_list ) {
opal_argv_free ( mca_btl_openib_component . if_exclude_list ) ;
mca_btl_openib_component . if_exclude_list = NULL ;
}
2005-07-01 01:28:35 +04:00
return btls ;
2007-06-14 05:59:25 +04:00
no_btls :
/* If we fail early enough in the setup, we just modex around that
there are no openib BTL ' s in this process and return NULL . */
2007-11-28 10:18:59 +03:00
if ( MCA_BTL_XRC_ENABLED )
OBJ_DESTRUCT ( & mca_btl_openib_component . ib_addr_table ) ;
2007-06-14 05:59:25 +04:00
mca_btl_openib_component . ib_num_btls = 0 ;
btl_openib_modex_send ( ) ;
return NULL ;
2005-07-01 01:28:35 +04:00
}
2008-01-09 13:27:15 +03:00
static void progress_pending_eager_rdma ( mca_btl_base_endpoint_t * ep )
2006-08-31 00:21:47 +04:00
{
2008-01-09 13:27:15 +03:00
int qp ;
opal_list_item_t * frag ;
2006-12-14 18:52:13 +03:00
2008-01-09 13:27:15 +03:00
/* Go over all QPs and try to send high prio packets over eager rdma
* channel */
OPAL_THREAD_LOCK ( & ep - > endpoint_lock ) ;
for ( qp = 0 ; qp < mca_btl_openib_component . num_qps ; qp + + ) {
while ( ep - > qps [ qp ] . qp - > sd_wqe > 0 & & ep - > eager_rdma_remote . tokens > 0 ) {
frag = opal_list_remove_first ( & ep - > qps [ qp ] . pending_frags [ 0 ] ) ;
if ( NULL = = frag )
break ;
mca_btl_openib_endpoint_post_send ( ep , to_send_frag ( frag ) ) ;
}
if ( ep - > eager_rdma_remote . tokens = = 0 )
break ;
2006-12-14 18:52:13 +03:00
}
2008-01-09 13:27:15 +03:00
OPAL_THREAD_UNLOCK ( & ep - > endpoint_lock ) ;
2006-08-31 00:21:47 +04:00
}
2008-01-09 13:27:15 +03:00
static inline int
get_enpoint_credits ( mca_btl_base_endpoint_t * ep , const int qp )
2007-11-28 10:13:34 +03:00
{
2008-01-09 13:27:15 +03:00
return BTL_OPENIB_QP_TYPE_PP ( qp ) ? ep - > qps [ qp ] . u . pp_qp . sd_credits : 1 ;
}
static void progress_pending_frags_pp ( mca_btl_base_endpoint_t * ep , const int qp )
{
int i ;
opal_list_item_t * frag ;
2008-01-21 15:11:18 +03:00
2008-01-09 13:27:15 +03:00
OPAL_THREAD_LOCK ( & ep - > endpoint_lock ) ;
for ( i = 0 ; i < 2 ; i + + ) {
while ( ( get_enpoint_credits ( ep , qp ) +
( 1 - i ) * ep - > eager_rdma_remote . tokens ) > 0 ) {
frag = opal_list_remove_first ( & ep - > qps [ qp ] . pending_frags [ i ] ) ;
if ( NULL = = frag )
break ;
mca_btl_openib_endpoint_post_send ( ep , to_send_frag ( frag ) ) ;
}
}
OPAL_THREAD_UNLOCK ( & ep - > endpoint_lock ) ;
}
void mca_btl_openib_frag_progress_pending_put_get ( mca_btl_base_endpoint_t * ep ,
const int qp )
2008-01-21 15:11:18 +03:00
{
2008-01-09 13:27:15 +03:00
mca_btl_openib_module_t * openib_btl = ep - > endpoint_btl ;
opal_list_item_t * frag ;
size_t i , len = opal_list_get_size ( & ep - > pending_get_frags ) ;
for ( i = 0 ; i < len & & ep - > qps [ qp ] . qp - > sd_wqe > 0 & & ep - > get_tokens > 0 ; i + + )
{
OPAL_THREAD_LOCK ( & ep - > endpoint_lock ) ;
frag = opal_list_remove_first ( & ( ep - > pending_get_frags ) ) ;
OPAL_THREAD_UNLOCK ( & ep - > endpoint_lock ) ;
if ( NULL = = frag )
break ;
if ( mca_btl_openib_get ( ( mca_btl_base_module_t * ) openib_btl , ep ,
& to_base_frag ( frag ) - > base ) = = OMPI_ERR_OUT_OF_RESOURCE )
break ;
}
2008-01-21 15:11:18 +03:00
2008-01-09 13:27:15 +03:00
len = opal_list_get_size ( & ep - > pending_put_frags ) ;
for ( i = 0 ; i < len & & ep - > qps [ qp ] . qp - > sd_wqe > 0 ; i + + ) {
OPAL_THREAD_LOCK ( & ep - > endpoint_lock ) ;
frag = opal_list_remove_first ( & ( ep - > pending_put_frags ) ) ;
OPAL_THREAD_UNLOCK ( & ep - > endpoint_lock ) ;
if ( NULL = = frag )
break ;
if ( mca_btl_openib_put ( ( mca_btl_base_module_t * ) openib_btl , ep ,
& to_base_frag ( frag ) - > base ) = = OMPI_ERR_OUT_OF_RESOURCE )
break ;
}
2007-11-28 10:13:34 +03:00
}
2006-08-31 00:21:47 +04:00
2006-09-12 13:17:59 +04:00
static int btl_openib_handle_incoming ( mca_btl_openib_module_t * openib_btl ,
2007-11-28 10:13:34 +03:00
mca_btl_openib_endpoint_t * ep ,
2008-01-21 15:11:18 +03:00
mca_btl_openib_recv_frag_t * frag ,
2007-11-28 10:12:44 +03:00
size_t byte_len )
2006-03-26 12:30:50 +04:00
{
2007-11-28 10:11:14 +03:00
mca_btl_base_descriptor_t * des = & to_base_frag ( frag ) - > base ;
mca_btl_openib_header_t * hdr = frag - > hdr ;
2007-11-28 10:13:34 +03:00
int rqp = to_base_frag ( frag ) - > base . order , cqp ;
uint16_t rcredits = 0 , credits ;
bool is_credit_msg ;
2007-11-28 10:11:14 +03:00
2007-11-28 10:13:34 +03:00
if ( ep - > nbo ) {
2007-11-28 10:11:14 +03:00
BTL_OPENIB_HEADER_NTOH ( * hdr ) ;
2007-01-13 02:14:45 +03:00
}
2007-03-14 17:36:03 +03:00
2006-09-12 13:17:59 +04:00
/* advance the segment address past the header and subtract from the
2007-11-28 10:13:34 +03:00
* length . */
2007-11-28 10:11:14 +03:00
des - > des_dst - > seg_len = byte_len - sizeof ( mca_btl_openib_header_t ) ;
2006-03-26 12:30:50 +04:00
2007-11-28 10:13:34 +03:00
if ( OPAL_LIKELY ( ! ( is_credit_msg = is_credit_message ( frag ) ) ) ) {
/* call registered callback */
2008-01-15 08:32:53 +03:00
mca_btl_active_message_callback_t * reg ;
reg = mca_btl_base_active_message_trigger + hdr - > tag ;
reg - > cbfunc ( & openib_btl - > super , hdr - > tag , des , reg - > cbdata ) ;
2007-12-09 16:56:13 +03:00
if ( MCA_BTL_OPENIB_RDMA_FRAG ( frag ) ) {
cqp = ( hdr - > credits > > 11 ) & 0x0f ;
hdr - > credits & = 0x87ff ;
} else {
cqp = rqp ;
}
2007-11-28 10:13:34 +03:00
if ( BTL_OPENIB_IS_RDMA_CREDITS ( hdr - > credits ) ) {
rcredits = BTL_OPENIB_CREDITS ( hdr - > credits ) ;
hdr - > credits = 0 ;
2007-11-28 10:12:44 +03:00
}
2007-07-24 17:23:08 +04:00
} else {
2007-11-28 10:13:34 +03:00
mca_btl_openib_rdma_credits_header_t * chdr = des - > des_dst - > seg_addr . pval ;
if ( ep - > nbo ) {
BTL_OPENIB_RDMA_CREDITS_HEADER_NTOH ( * chdr ) ;
}
cqp = chdr - > qpn ;
rcredits = chdr - > rdma_credits ;
}
credits = hdr - > credits ;
2007-11-28 10:12:44 +03:00
2007-11-28 10:13:34 +03:00
if ( hdr - > cm_seen )
OPAL_THREAD_ADD32 ( & ep - > qps [ cqp ] . u . pp_qp . cm_sent , - hdr - > cm_seen ) ;
/* Now return fragment. Don't touch hdr after this point! */
if ( MCA_BTL_OPENIB_RDMA_FRAG ( frag ) ) {
mca_btl_openib_eager_rdma_local_t * erl = & ep - > eager_rdma_local ;
OPAL_THREAD_LOCK ( & erl - > lock ) ;
MCA_BTL_OPENIB_RDMA_MAKE_REMOTE ( frag - > ftr ) ;
while ( erl - > tail ! = erl - > head ) {
mca_btl_openib_recv_frag_t * tf ;
tf = MCA_BTL_OPENIB_GET_LOCAL_RDMA_FRAG ( ep , erl - > tail ) ;
if ( MCA_BTL_OPENIB_RDMA_FRAG_LOCAL ( tf ) )
break ;
OPAL_THREAD_ADD32 ( & erl - > credits , 1 ) ;
MCA_BTL_OPENIB_RDMA_NEXT_INDEX ( erl - > tail ) ;
This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
2007-07-18 05:15:59 +04:00
}
2007-11-28 10:13:34 +03:00
OPAL_THREAD_UNLOCK ( & erl - > lock ) ;
} else {
MCA_BTL_IB_FRAG_RETURN ( frag ) ;
2007-11-28 10:20:26 +03:00
if ( BTL_OPENIB_QP_TYPE_PP ( rqp ) ) {
2007-11-28 10:13:34 +03:00
if ( OPAL_UNLIKELY ( is_credit_msg ) )
OPAL_THREAD_ADD32 ( & ep - > qps [ cqp ] . u . pp_qp . cm_received , 1 ) ;
else
OPAL_THREAD_ADD32 ( & ep - > qps [ rqp ] . u . pp_qp . rd_posted , - 1 ) ;
mca_btl_openib_endpoint_post_rr ( ep , cqp ) ;
2007-11-28 10:20:26 +03:00
} else {
mca_btl_openib_module_t * btl = ep - > endpoint_btl ;
OPAL_THREAD_ADD32 ( & btl - > qps [ rqp ] . u . srq_qp . rd_posted , - 1 ) ;
2007-11-28 17:52:31 +03:00
mca_btl_openib_post_srr ( btl , rqp ) ;
2007-11-28 10:13:34 +03:00
}
}
if ( rcredits > 0 ) {
OPAL_THREAD_ADD32 ( & ep - > eager_rdma_remote . tokens , rcredits ) ;
progress_pending_eager_rdma ( ep ) ;
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
}
2006-03-26 12:30:50 +04:00
2007-11-28 10:13:34 +03:00
assert ( ( cqp ! = MCA_BTL_NO_ORDER & & BTL_OPENIB_QP_TYPE_PP ( cqp ) ) | | ! credits ) ;
if ( credits ) {
OPAL_THREAD_ADD32 ( & ep - > qps [ cqp ] . u . pp_qp . sd_credits , credits ) ;
progress_pending_frags_pp ( ep , cqp ) ;
}
2007-12-09 16:56:13 +03:00
send_credits ( ep , cqp ) ;
2007-11-28 10:13:34 +03:00
2006-03-26 12:30:50 +04:00
return OMPI_SUCCESS ;
}
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
static char * btl_openib_component_status_to_string ( enum ibv_wc_status status )
2008-01-21 15:11:18 +03:00
{
switch ( status ) {
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
case IBV_WC_SUCCESS :
2008-01-21 15:11:18 +03:00
return " SUCCESS " ;
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
break ;
case IBV_WC_LOC_LEN_ERR :
2008-01-21 15:11:18 +03:00
return " LOCAL LENGTH ERROR " ;
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
break ;
case IBV_WC_LOC_QP_OP_ERR :
return " LOCAL QP OPERATION ERROR " ;
break ;
case IBV_WC_LOC_EEC_OP_ERR :
return " LOCAL EEC OPERATION ERROR " ;
break ;
case IBV_WC_LOC_PROT_ERR :
return " LOCAL PROTOCOL ERROR " ;
break ;
case IBV_WC_WR_FLUSH_ERR :
return " WORK REQUEST FLUSHED ERROR " ;
break ;
case IBV_WC_MW_BIND_ERR :
return " MEMORY WINDOW BIND ERROR " ;
break ;
case IBV_WC_BAD_RESP_ERR :
return " BAD RESPONSE ERROR " ;
break ;
case IBV_WC_LOC_ACCESS_ERR :
return " LOCAL ACCESS ERROR " ;
break ;
case IBV_WC_REM_INV_REQ_ERR :
return " INVALID REQUEST ERROR " ;
break ;
case IBV_WC_REM_ACCESS_ERR :
return " REMOTE ACCESS ERROR " ;
break ;
case IBV_WC_REM_OP_ERR :
return " REMOTE OPERATION ERROR " ;
break ;
case IBV_WC_RETRY_EXC_ERR :
return " RETRY EXCEEDED ERROR " ;
break ;
case IBV_WC_RNR_RETRY_EXC_ERR :
2007-04-27 01:03:38 +04:00
return " RECEIVER NOT READY RETRY EXCEEDED ERROR " ;
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
break ;
case IBV_WC_LOC_RDD_VIOL_ERR :
return " LOCAL RDD VIOLATION ERROR " ;
break ;
case IBV_WC_REM_INV_RD_REQ_ERR :
return " INVALID READ REQUEST ERROR " ;
break ;
case IBV_WC_REM_ABORT_ERR :
return " REMOTE ABORT ERROR " ;
break ;
case IBV_WC_INV_EECN_ERR :
return " INVALID EECN ERROR " ;
break ;
case IBV_WC_INV_EEC_STATE_ERR :
return " INVALID EEC STATE ERROR " ;
break ;
2008-01-21 15:11:18 +03:00
case IBV_WC_FATAL_ERR :
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
return " FATAL ERROR " ;
break ;
case IBV_WC_RESP_TIMEOUT_ERR :
return " RESPONSE TIMEOUT ERROR " ;
break ;
case IBV_WC_GENERAL_ERR :
return " GENERAL ERROR " ;
break ;
default :
return " STATUS UNDEFINED " ;
break ;
}
2006-06-06 00:02:41 +04:00
}
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
2008-01-09 17:46:41 +03:00
static void
progress_pending_frags_wqe ( mca_btl_base_endpoint_t * ep , const int qpn )
2007-11-28 10:15:20 +03:00
{
int i ;
opal_list_item_t * frag ;
2008-01-09 17:46:41 +03:00
mca_btl_openib_qp_t * qp = ep - > qps [ qpn ] . qp ;
OPAL_THREAD_LOCK ( & ep - > endpoint_lock ) ;
2007-11-28 10:15:20 +03:00
for ( i = 0 ; i < 2 ; i + + ) {
while ( qp - > sd_wqe > 0 ) {
mca_btl_base_endpoint_t * ep ;
2008-01-09 17:46:41 +03:00
OPAL_THREAD_LOCK ( & qp - > lock ) ;
2007-11-28 10:15:20 +03:00
frag = opal_list_remove_first ( & qp - > pending_frags [ i ] ) ;
2008-01-09 17:46:41 +03:00
OPAL_THREAD_UNLOCK ( & qp - > lock ) ;
2007-11-28 10:15:20 +03:00
if ( NULL = = frag )
break ;
ep = to_com_frag ( frag ) - > endpoint ;
mca_btl_openib_endpoint_post_send ( ep , to_send_frag ( frag ) ) ;
}
}
2008-01-09 17:46:41 +03:00
OPAL_THREAD_UNLOCK ( & ep - > endpoint_lock ) ;
2007-11-28 10:15:20 +03:00
}
2008-01-09 13:27:15 +03:00
static void progress_pending_frags_srq ( mca_btl_openib_module_t * openib_btl ,
const int qp )
This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
2007-07-18 05:15:59 +04:00
{
2007-11-28 10:11:14 +03:00
opal_list_item_t * frag ;
2007-11-28 10:12:44 +03:00
int i ;
2008-01-21 15:11:18 +03:00
2007-11-28 10:18:59 +03:00
assert ( BTL_OPENIB_QP_TYPE_SRQ ( qp ) | | BTL_OPENIB_QP_TYPE_XRC ( qp ) ) ;
2008-01-21 15:11:18 +03:00
2007-11-28 10:12:44 +03:00
for ( i = 0 ; i < 2 ; i + + ) {
2007-11-28 10:20:26 +03:00
while ( openib_btl - > qps [ qp ] . u . srq_qp . sd_credits > 0 ) {
2007-11-28 10:12:44 +03:00
OPAL_THREAD_LOCK ( & openib_btl - > ib_lock ) ;
2007-11-28 10:20:26 +03:00
frag = opal_list_remove_first (
& openib_btl - > qps [ qp ] . u . srq_qp . pending_frags [ i ] ) ;
2007-11-28 10:12:44 +03:00
OPAL_THREAD_UNLOCK ( & openib_btl - > ib_lock ) ;
if ( NULL = = frag )
break ;
mca_btl_openib_endpoint_send ( to_com_frag ( frag ) - > endpoint ,
to_send_frag ( frag ) ) ;
}
2007-05-09 01:47:21 +04:00
}
}
2007-12-23 15:29:34 +03:00
static char * cq_name [ ] = { " HP CQ " , " LP CQ " } ;
static void handle_wc ( mca_btl_openib_hca_t * hca , const uint32_t cq ,
struct ibv_wc * wc )
2006-11-02 19:15:21 +03:00
{
2007-12-23 15:29:34 +03:00
static int flush_err_printed [ ] = { 0 , 0 } ;
2007-11-28 10:11:14 +03:00
mca_btl_openib_com_frag_t * frag ;
mca_btl_base_descriptor_t * des ;
2007-12-23 15:29:34 +03:00
mca_btl_openib_endpoint_t * endpoint ;
2007-08-20 16:28:25 +04:00
mca_btl_openib_module_t * openib_btl = NULL ;
2007-12-23 15:29:34 +03:00
ompi_proc_t * remote_proc = NULL ;
int qp ;
2006-11-02 19:15:21 +03:00
2007-12-23 15:29:34 +03:00
des = ( mca_btl_base_descriptor_t * ) ( uintptr_t ) wc - > wr_id ;
frag = to_com_frag ( des ) ;
2007-12-02 17:46:37 +03:00
2007-12-23 15:29:34 +03:00
/* For receive fragments "order" contains QP idx the fragment was posted
* to . For send fragments " order " contains QP idx the fragment was send
* through */
qp = des - > order ;
endpoint = frag - > endpoint ;
2007-11-28 10:11:14 +03:00
2007-12-23 15:29:34 +03:00
if ( endpoint )
openib_btl = endpoint - > endpoint_btl ;
2007-11-28 10:11:14 +03:00
2007-12-23 15:29:34 +03:00
if ( wc - > status ! = IBV_WC_SUCCESS )
goto error ;
2007-08-20 16:28:25 +04:00
2007-12-23 15:29:34 +03:00
/* Handle work completions */
switch ( wc - > opcode ) {
case IBV_WC_RDMA_READ :
OPAL_THREAD_ADD32 ( & endpoint - > get_tokens , 1 ) ;
/* fall through */
case IBV_WC_RDMA_WRITE :
case IBV_WC_SEND :
if ( openib_frag_type ( des ) = = MCA_BTL_OPENIB_FRAG_SEND ) {
opal_list_item_t * i ;
while ( ( i = opal_list_remove_first ( & to_send_frag ( des ) - > coalesced_frags ) ) )
to_base_frag ( i ) - > base . des_cbfunc ( & openib_btl - > super , endpoint ,
& to_base_frag ( i ) - > base , OMPI_SUCCESS ) ;
}
/* Process a completed send/put/get */
des - > des_cbfunc ( & openib_btl - > super , endpoint , des , OMPI_SUCCESS ) ;
2007-11-28 10:11:14 +03:00
2007-12-23 15:29:34 +03:00
/* return send wqe */
qp_put_wqe ( endpoint , qp ) ;
2005-11-10 23:15:02 +03:00
2007-12-23 15:29:34 +03:00
if ( IBV_WC_SEND = = wc - > opcode & & ! BTL_OPENIB_QP_TYPE_PP ( qp ) ) {
OPAL_THREAD_ADD32 ( & openib_btl - > qps [ qp ] . u . srq_qp . sd_credits , 1 ) ;
This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
2007-07-18 05:15:59 +04:00
2007-12-23 15:29:34 +03:00
/* new SRQ credit available. Try to progress pending frags*/
progress_pending_frags_srq ( openib_btl , qp ) ;
}
/* new wqe or/and get token available. Try to progress pending frags */
2008-01-09 17:46:41 +03:00
progress_pending_frags_wqe ( endpoint , qp ) ;
2007-12-23 15:29:34 +03:00
mca_btl_openib_frag_progress_pending_put_get ( endpoint , qp ) ;
break ;
case IBV_WC_RECV :
if ( wc - > wc_flags & IBV_WC_WITH_IMM ) {
endpoint = ( mca_btl_openib_endpoint_t * )
opal_pointer_array_get_item ( hca - > endpoints , wc - > imm_data ) ;
frag - > endpoint = endpoint ;
openib_btl = endpoint - > endpoint_btl ;
}
2007-11-28 10:18:59 +03:00
2007-12-23 15:29:34 +03:00
/* Process a RECV */
if ( btl_openib_handle_incoming ( openib_btl , endpoint , to_recv_frag ( frag ) ,
wc - > byte_len ) ! = OMPI_SUCCESS ) {
openib_btl - > error_cb ( & openib_btl - > super , MCA_BTL_ERROR_FLAGS_FATAL ) ;
2005-11-10 23:15:02 +03:00
break ;
2007-12-23 15:29:34 +03:00
}
2007-07-24 17:23:08 +04:00
2007-12-23 15:29:34 +03:00
/* decide if it is time to setup an eager rdma channel */
if ( ! endpoint - > eager_rdma_local . base . pval & & endpoint - > use_eager_rdma & &
wc - > byte_len < mca_btl_openib_component . eager_limit & &
openib_btl - > eager_rdma_channels <
mca_btl_openib_component . max_eager_rdma & &
OPAL_THREAD_ADD32 ( & endpoint - > eager_recv_count , 1 ) = =
mca_btl_openib_component . eager_rdma_threshold ) {
mca_btl_openib_endpoint_connect_eager_rdma ( endpoint ) ;
}
break ;
default :
BTL_ERROR ( ( " Unhandled work completion opcode is %d " , wc - > opcode ) ) ;
if ( openib_btl )
openib_btl - > error_cb ( & openib_btl - > super , MCA_BTL_ERROR_FLAGS_FATAL ) ;
break ;
}
2007-07-26 17:56:07 +04:00
2007-12-23 15:29:34 +03:00
return ;
error :
if ( endpoint & & endpoint - > endpoint_proc & & endpoint - > endpoint_proc - > proc_ompi )
remote_proc = endpoint - > endpoint_proc - > proc_ompi ;
if ( wc - > status ! = IBV_WC_WR_FLUSH_ERR | | ! flush_err_printed [ cq ] + + ) {
BTL_PEER_ERROR ( remote_proc , ( " error polling %s with status %s "
" status number %d for wr_id %llu opcode %d qp_idx %d " ,
cq_name [ cq ] , btl_openib_component_status_to_string ( wc - > status ) ,
wc - > status , wc - > wr_id , wc - > opcode , qp ) ) ;
}
if ( IBV_WC_RETRY_EXC_ERR = = wc - > status )
opal_show_help ( " help-mpi-btl-openib.txt " , " btl_openib:retry-exceeded " , true ) ;
if ( openib_btl )
openib_btl - > error_cb ( & openib_btl - > super , MCA_BTL_ERROR_FLAGS_FATAL ) ;
}
static int poll_hca ( mca_btl_openib_hca_t * hca , int count )
{
2008-01-21 15:11:18 +03:00
int ne = 0 , cq ;
2007-12-23 15:29:34 +03:00
uint32_t hp_iter = 0 ;
struct ibv_wc wc ;
hca - > pollme = false ;
for ( cq = 0 ; cq < 2 & & hp_iter < mca_btl_openib_component . cq_poll_progress ; )
{
ne = ibv_poll_cq ( hca - > ib_cq [ cq ] , 1 , & wc ) ;
if ( 0 = = ne ) {
/* don't check low prio cq if there was something in high prio cq,
* but for each cq_poll_ratio hp cq polls poll lp cq once */
if ( count & & hca - > hp_cq_polls )
2005-07-01 01:28:35 +04:00
break ;
2007-12-23 15:29:34 +03:00
cq + + ;
hca - > hp_cq_polls = mca_btl_openib_component . cq_poll_ratio ;
continue ;
2005-07-01 01:28:35 +04:00
}
2007-12-23 15:29:34 +03:00
if ( ne < 0 )
goto error ;
count + + ;
if ( BTL_OPENIB_HP_CQ = = cq ) {
hca - > pollme = true ;
hp_iter + + ;
hca - > hp_cq_polls - - ;
}
2008-01-21 15:11:18 +03:00
2007-12-23 15:29:34 +03:00
handle_wc ( hca , cq , & wc ) ;
2005-07-01 01:28:35 +04:00
}
2008-01-21 15:11:18 +03:00
2005-07-01 01:28:35 +04:00
return count ;
2006-09-12 13:17:59 +04:00
error :
2007-12-23 15:29:34 +03:00
BTL_ERROR ( ( " error polling %s with %d errno says %s \n " , cq_name [ cq ] , ne ,
2008-01-21 15:11:18 +03:00
strerror ( errno ) ) ) ;
2006-09-05 19:59:02 +04:00
return count ;
2005-07-01 01:28:35 +04:00
}
2007-06-14 05:59:25 +04:00
2008-01-09 13:27:15 +03:00
# if OMPI_ENABLE_PROGRESS_THREADS == 1
void * mca_btl_openib_progress_thread ( opal_object_t * arg )
2007-06-14 05:59:25 +04:00
{
2008-01-09 13:27:15 +03:00
opal_thread_t * thread = ( opal_thread_t * ) arg ;
mca_btl_openib_hca_t * hca = thread - > t_arg ;
struct ibv_cq * ev_cq ;
void * ev_ctx ;
2007-06-14 05:59:25 +04:00
2008-01-09 13:27:15 +03:00
/* This thread enter in a cancel enabled state */
pthread_setcancelstate ( PTHREAD_CANCEL_ENABLE , NULL ) ;
pthread_setcanceltype ( PTHREAD_CANCEL_ASYNCHRONOUS , NULL ) ;
2007-06-14 05:59:25 +04:00
2008-01-09 13:27:15 +03:00
opal_output ( 0 , " WARNING: the openib btl progress thread code *does not yet work*. Your run is likely to hang, crash, break the kitchen sink, and/or eat your cat. You have been warned. " ) ;
while ( hca - > progress ) {
while ( opal_progress_threads ( ) ) {
while ( opal_progress_threads ( ) )
sched_yield ( ) ;
usleep ( 100 ) ; /* give app a chance to re-enter library */
2007-06-14 05:59:25 +04:00
}
2008-01-09 13:27:15 +03:00
if ( ibv_get_cq_event ( hca - > ib_channel , & ev_cq , & ev_ctx ) )
BTL_ERROR ( ( " Failed to get CQ event with error %s " ,
strerror ( errno ) ) ) ;
if ( ibv_req_notify_cq ( ev_cq , 0 ) ) {
BTL_ERROR ( ( " Couldn't request CQ notification with error %s " ,
strerror ( errno ) ) ) ;
2007-06-14 05:59:25 +04:00
}
2008-01-09 13:27:15 +03:00
ibv_ack_cq_events ( ev_cq , 1 ) ;
while ( poll_hca ( hca , 0 ) ) ;
2007-06-14 05:59:25 +04:00
}
2008-01-09 13:27:15 +03:00
return PTHREAD_CANCELED ;
}
# endif
2007-06-14 05:59:25 +04:00
2008-01-09 13:27:15 +03:00
static int progress_one_hca ( mca_btl_openib_hca_t * hca )
{
int i , c , count = 0 , ret ;
mca_btl_openib_recv_frag_t * frag ;
mca_btl_openib_endpoint_t * endpoint ;
uint32_t non_eager_rdma_endpoints = 0 ;
2007-06-14 05:59:25 +04:00
2008-01-09 13:27:15 +03:00
c = hca - > eager_rdma_buffers_count ;
non_eager_rdma_endpoints + = ( hca - > non_eager_rdma_endpoints + hca - > pollme ) ;
for ( i = 0 ; i < c ; i + + ) {
endpoint = hca - > eager_rdma_buffers [ i ] ;
if ( ! endpoint )
continue ;
OPAL_THREAD_LOCK ( & endpoint - > eager_rdma_local . lock ) ;
frag = MCA_BTL_OPENIB_GET_LOCAL_RDMA_FRAG ( endpoint ,
endpoint - > eager_rdma_local . head ) ;
if ( MCA_BTL_OPENIB_RDMA_FRAG_LOCAL ( frag ) ) {
uint32_t size ;
mca_btl_openib_module_t * btl = endpoint - > endpoint_btl ;
opal_atomic_rmb ( ) ;
if ( endpoint - > nbo ) {
BTL_OPENIB_FOOTER_NTOH ( * frag - > ftr ) ;
2007-06-14 05:59:25 +04:00
}
2008-01-09 13:27:15 +03:00
size = MCA_BTL_OPENIB_RDMA_FRAG_GET_SIZE ( frag - > ftr ) ;
# if OMPI_ENABLE_DEBUG
if ( frag - > ftr - > seq ! = endpoint - > eager_rdma_local . seq )
BTL_ERROR ( ( " Eager RDMA wrong SEQ: received %d expected %d " ,
frag - > ftr - > seq ,
endpoint - > eager_rdma_local . seq ) ) ;
endpoint - > eager_rdma_local . seq + + ;
# endif
MCA_BTL_OPENIB_RDMA_NEXT_INDEX ( endpoint - > eager_rdma_local . head ) ;
OPAL_THREAD_UNLOCK ( & endpoint - > eager_rdma_local . lock ) ;
frag - > hdr = ( mca_btl_openib_header_t * ) ( ( ( char * ) frag - > ftr ) -
size + sizeof ( mca_btl_openib_footer_t ) ) ;
to_base_frag ( frag ) - > segment . seg_addr . pval =
( ( unsigned char * ) frag - > hdr ) + sizeof ( mca_btl_openib_header_t ) ;
ret = btl_openib_handle_incoming ( btl , to_com_frag ( frag ) - > endpoint ,
frag , size - sizeof ( mca_btl_openib_footer_t ) ) ;
if ( ret ! = MPI_SUCCESS ) {
btl - > error_cb ( & btl - > super , MCA_BTL_ERROR_FLAGS_FATAL ) ;
return 0 ;
2007-06-14 05:59:25 +04:00
}
2008-01-09 13:27:15 +03:00
count + + ;
} else
OPAL_THREAD_UNLOCK ( & endpoint - > eager_rdma_local . lock ) ;
2007-06-14 05:59:25 +04:00
}
2008-01-09 13:27:15 +03:00
hca - > eager_rdma_polls - - ;
2007-06-14 05:59:25 +04:00
2008-01-09 13:27:15 +03:00
if ( 0 = = count | | non_eager_rdma_endpoints ! = 0 | | ! hca - > eager_rdma_polls ) {
count + = poll_hca ( hca , count ) ;
hca - > eager_rdma_polls = mca_btl_openib_component . eager_rdma_poll_ratio ;
}
return count ;
}
/*
* IB component progress .
*/
static int btl_openib_component_progress ( void )
{
int i ;
int count = 0 ;
# if OMPI_HAVE_THREADS
if ( OPAL_UNLIKELY ( mca_btl_openib_component . use_async_event_thread & &
mca_btl_openib_component . fatal_counter ) ) {
goto error ;
}
# endif
for ( i = 0 ; i < mca_btl_openib_component . hcas_count ; i + + ) {
mca_btl_openib_hca_t * hca =
opal_pointer_array_get_item ( & mca_btl_openib_component . hcas , i ) ;
count + = progress_one_hca ( hca ) ;
}
return count ;
# if OMPI_HAVE_THREADS
error :
/* Set the fatal counter to zero */
mca_btl_openib_component . fatal_counter = 0 ;
/* Lets found all fatal events */
for ( i = 0 ; i < mca_btl_openib_component . ib_num_btls ; i + + ) {
mca_btl_openib_module_t * openib_btl =
mca_btl_openib_component . openib_btls [ i ] ;
if ( openib_btl - > hca - > got_fatal_event ) {
openib_btl - > error_cb ( & openib_btl - > super , MCA_BTL_ERROR_FLAGS_FATAL ) ;
}
}
return count ;
# endif
2007-06-14 05:59:25 +04:00
}
2007-11-28 17:52:31 +03:00
int mca_btl_openib_post_srr ( mca_btl_openib_module_t * openib_btl , const int qp )
{
2007-11-28 17:57:15 +03:00
int rd_low = mca_btl_openib_component . qp_infos [ qp ] . rd_low ;
int rd_num = mca_btl_openib_component . qp_infos [ qp ] . rd_num ;
int num_post , i , rc ;
struct ibv_recv_wr * bad_wr , * wr_list = NULL , * wr = NULL ;
2007-11-28 17:52:31 +03:00
assert ( ! BTL_OPENIB_QP_TYPE_PP ( qp ) ) ;
OPAL_THREAD_LOCK ( & openib_btl - > ib_lock ) ;
2007-11-28 17:57:15 +03:00
if ( openib_btl - > qps [ qp ] . u . srq_qp . rd_posted > rd_low ) {
OPAL_THREAD_UNLOCK ( & openib_btl - > ib_lock ) ;
return OMPI_SUCCESS ;
}
num_post = rd_num - openib_btl - > qps [ qp ] . u . srq_qp . rd_posted ;
for ( i = 0 ; i < num_post ; i + + ) {
ompi_free_list_item_t * item ;
2008-01-09 13:26:21 +03:00
OMPI_FREE_LIST_WAIT ( & openib_btl - > hca - > qps [ qp ] . recv_free , item , rc ) ;
2007-11-28 17:57:15 +03:00
to_base_frag ( item ) - > base . order = qp ;
to_com_frag ( item ) - > endpoint = NULL ;
if ( NULL = = wr )
wr = wr_list = & to_recv_frag ( item ) - > rd_desc ;
else
wr = wr - > next = & to_recv_frag ( item ) - > rd_desc ;
}
wr - > next = NULL ;
rc = ibv_post_srq_recv ( openib_btl - > qps [ qp ] . u . srq_qp . srq , wr_list , & bad_wr ) ;
if ( OPAL_LIKELY ( 0 = = rc ) ) {
2007-11-28 17:52:31 +03:00
OPAL_THREAD_ADD32 ( & openib_btl - > qps [ qp ] . u . srq_qp . rd_posted , num_post ) ;
2007-11-28 17:57:15 +03:00
OPAL_THREAD_UNLOCK ( & openib_btl - > ib_lock ) ;
return OMPI_SUCCESS ;
2007-11-28 17:52:31 +03:00
}
2007-11-28 17:57:15 +03:00
for ( i = 0 ; wr_list & & wr_list ! = bad_wr ; i + + , wr_list = wr_list - > next ) ;
BTL_ERROR ( ( " error posting receive descriptors to shared receive "
" queue %d (%d from %d) " , qp , i , num_post ) ) ;
OPAL_THREAD_UNLOCK ( & openib_btl - > ib_lock ) ;
return OMPI_ERROR ;
2007-11-28 17:52:31 +03:00
}