2006-06-06 01:24:42 +04:00
|
|
|
# -*- text -*-
|
|
|
|
#
|
|
|
|
# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
|
|
|
|
# University Research and Technology
|
|
|
|
# Corporation. All rights reserved.
|
|
|
|
# Copyright (c) 2004-2005 The University of Tennessee and The University
|
|
|
|
# of Tennessee Research Foundation. All rights
|
|
|
|
# reserved.
|
2011-07-04 18:00:41 +04:00
|
|
|
# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
2006-06-06 01:24:42 +04:00
|
|
|
# University of Stuttgart. All rights reserved.
|
|
|
|
# Copyright (c) 2004-2006 The Regents of the University of California.
|
|
|
|
# All rights reserved.
|
2011-02-24 17:09:22 +03:00
|
|
|
# Copyright (c) 2006-2011 Cisco Systems, Inc. All rights reserved.
|
2009-12-15 17:25:07 +03:00
|
|
|
# Copyright (c) 2007-2009 Mellanox Technologies. All rights reserved.
|
2009-06-17 00:59:53 +04:00
|
|
|
# Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved.
|
2014-07-31 19:34:38 +04:00
|
|
|
# Copyright (c) 2013-2014 NVIDIA Corporation. All rights reserved.
|
2006-06-06 01:24:42 +04:00
|
|
|
# $COPYRIGHT$
|
2011-07-04 18:00:41 +04:00
|
|
|
#
|
2006-06-06 01:24:42 +04:00
|
|
|
# Additional copyrights may follow
|
2011-07-04 18:00:41 +04:00
|
|
|
#
|
2006-06-06 01:24:42 +04:00
|
|
|
# $HEADER$
|
|
|
|
#
|
2007-06-14 05:59:25 +04:00
|
|
|
# This is the US/English help file for Open MPI's OpenFabrics support
|
|
|
|
# (the openib BTL).
|
2006-06-06 01:24:42 +04:00
|
|
|
#
|
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
|
|
|
[ini file:file not found]
|
2007-11-28 10:18:59 +03:00
|
|
|
The Open MPI OpenFabrics (openib) BTL component was unable to find or
|
2008-07-23 04:28:59 +04:00
|
|
|
read an INI file that was requested via the
|
|
|
|
btl_openib_device_param_files MCA parameter. Please check this file
|
|
|
|
and/or modify the btl_openib_evice_param_files MCA parameter:
|
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
|
|
|
|
|
|
|
%s
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
|
|
|
[ini file:not in a section]
|
2007-11-28 10:18:59 +03:00
|
|
|
In parsing the OpenFabrics (openib) BTL parameter file, values were
|
|
|
|
found that were not in a valid INI section. These values will be
|
|
|
|
ignored. Please re-check this file:
|
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
|
|
|
|
|
|
|
%s
|
|
|
|
|
|
|
|
At line %d, near the following text:
|
|
|
|
|
|
|
|
%s
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
|
|
|
[ini file:unexpected token]
|
2007-11-28 10:18:59 +03:00
|
|
|
In parsing the OpenFabrics (openib) BTL parameter file, unexpected
|
|
|
|
tokens were found (this may cause significant portions of the INI file
|
|
|
|
to be ignored). Please re-check this file:
|
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
|
|
|
|
|
|
|
%s
|
|
|
|
|
|
|
|
At line %d, near the following text:
|
|
|
|
|
|
|
|
%s
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
|
|
|
[ini file:expected equals]
|
2007-11-28 10:18:59 +03:00
|
|
|
In parsing the OpenFabrics (openib) BTL parameter file, unexpected
|
|
|
|
tokens were found (this may cause significant portions of the INI file
|
|
|
|
to be ignored). An equals sign ("=") was expected but was not found.
|
|
|
|
Please re-check this file:
|
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
|
|
|
|
|
|
|
%s
|
|
|
|
|
|
|
|
At line %d, near the following text:
|
|
|
|
|
|
|
|
%s
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
|
|
|
[ini file:expected newline]
|
2007-11-28 10:18:59 +03:00
|
|
|
In parsing the OpenFabrics (openib) BTL parameter file, unexpected
|
|
|
|
tokens were found (this may cause significant portions of the INI file
|
|
|
|
to be ignored). A newline was expected but was not found. Please
|
|
|
|
re-check this file:
|
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
|
|
|
|
|
|
|
%s
|
|
|
|
|
|
|
|
At line %d, near the following text:
|
|
|
|
|
|
|
|
%s
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
2007-01-22 21:45:43 +03:00
|
|
|
[ini file:unknown field]
|
2007-11-28 10:18:59 +03:00
|
|
|
In parsing the OpenFabrics (openib) BTL parameter file, an
|
|
|
|
unrecognized field name was found. Please re-check this file:
|
2007-01-22 21:45:43 +03:00
|
|
|
|
|
|
|
%s
|
|
|
|
|
|
|
|
At line %d, the field named:
|
|
|
|
|
|
|
|
%s
|
|
|
|
|
|
|
|
This field, and any other unrecognized fields, will be skipped.
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
2008-07-23 04:28:59 +04:00
|
|
|
[no device params found]
|
|
|
|
WARNING: No preset parameters were found for the device that Open MPI
|
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
|
|
|
detected:
|
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
Device name: %s
|
|
|
|
Device vendor ID: 0x%04x
|
|
|
|
Device vendor part ID: %d
|
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Default device parameters will be used, which may result in lower
|
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
|
|
|
performance. You can edit any of the files specified by the
|
2008-07-23 04:28:59 +04:00
|
|
|
btl_openib_device_param_files MCA parameter to set values for your
|
|
|
|
device.
|
Bring over all the work from the /tmp/ib-hw-detect branch. In
addition to my design and testing, it was conceptually approved by
Gil, Gleb, Pasha, Brad, and Galen. Functionally [probably somewhat
lightly] tested by Galen. We may still have to shake out some bugs
during the next few months, but it seems to be working for all the
cases that I can throw at it.
Here's a summary of the changes from that branch:
* Move MCA parameter registration to a new file (btl_openib_mca.c):
* Properly check the retun status of registering MCA params
* Check for valid values of MCA parameters
* Make help strings better
* Otherwise, the only default value of an MCA param that was
changed was max_btls; it went from 4 to -1 (meaning: use all
available)
* Properly prototyped internal functions in _component.c
* Made a bunch of functions static that didn't need to be public
* Renamed to remove "mca_" prefix from static functions
* Call new MCA param registration function
* Call new INI file read/lookup/finalize functions
* Updated a bunch of macros to be "BTL_" instead of "ORTE_"
* Be a little more consistent with return values
* Handle -1 for the max_btls MCA param
* Fixed a free() that should have been an OBJ_RELEASE()
* Some re-indenting
* Added INI-file parsing
* New flex file: btl_openib_ini.l
* New default HCA params .ini file (probably to be expanded over
time by other HCA vendors)
* Added more show_help messages for parsing problems
* Read in INI files and cache the values for later lookup
* When component opens an HCA, lookup to see if any corresponding
values were found in the INI files (ID'ed by the HCA vendor_id
and vendor_part_id)
* Added btl_openib_verbose MCA param that shows what the INI-file
stuff does (e.g., shows which MTU your HCA ends up using)
* Added btl_openib_hca_param_files as a colon-delimited list of INI
files to check for values during startup (in order,
left-to-right, just like the MCA base directory param).
* MTU is currently the only value supported in this framework.
* It is not a fatal error if we don't find params for the HCA in
the INI file(s). Instead, just print a warning. New MCA param
btl_openib_warn_no_hca_params_found can be used to disable
printing the warning.
* Add MTU to peer negotiation when making a connection
* Exchange maximum MTU; select the lesser of the two
This commit was SVN r11182.
2006-08-14 23:30:37 +04:00
|
|
|
|
|
|
|
NOTE: You can turn off this warning by setting the MCA parameter
|
2008-07-23 04:28:59 +04:00
|
|
|
btl_openib_warn_no_device_params_found to 0.
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
2007-01-25 01:25:40 +03:00
|
|
|
[init-fail-no-mem]
|
2007-11-28 10:18:59 +03:00
|
|
|
The OpenFabrics (openib) BTL failed to initialize while trying to
|
|
|
|
allocate some locked memory. This typically can indicate that the
|
|
|
|
memlock limits are set too low. For most HPC installations, the
|
|
|
|
memlock limits should be set to "unlimited". The failure occured
|
|
|
|
here:
|
2007-01-25 01:25:40 +03:00
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
OMPI source: %s:%d
|
|
|
|
Function: %s()
|
|
|
|
Device: %s
|
|
|
|
Memlock limit: %s
|
2007-01-25 01:25:40 +03:00
|
|
|
|
|
|
|
You may need to consult with your system administrator to get this
|
|
|
|
problem fixed. This FAQ entry on the Open MPI web site may also be
|
|
|
|
helpful:
|
|
|
|
|
|
|
|
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
2007-01-25 01:25:40 +03:00
|
|
|
[init-fail-create-q]
|
2007-11-28 10:18:59 +03:00
|
|
|
The OpenFabrics (openib) BTL failed to initialize while trying to
|
|
|
|
create an internal queue. This typically indicates a failed
|
2008-05-30 23:03:58 +04:00
|
|
|
OpenFabrics installation, faulty hardware, or that Open MPI is
|
|
|
|
attempting to use a feature that is not supported on your hardware
|
|
|
|
(i.e., is a shared receive queue specified in the
|
|
|
|
btl_openib_receive_queues MCA parameter with a device that does not
|
|
|
|
support it?). The failure occured here:
|
2007-01-25 01:25:40 +03:00
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
OMPI source: %s:%d
|
|
|
|
Function: %s()
|
|
|
|
Error: %s (errno=%d)
|
|
|
|
Device: %s
|
2007-01-25 01:25:40 +03:00
|
|
|
|
|
|
|
You may need to consult with your system administrator to get this
|
|
|
|
problem fixed.
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
2008-06-02 15:03:48 +04:00
|
|
|
[pp rnr retry exceeded]
|
|
|
|
The OpenFabrics "receiver not ready" retry count on a per-peer
|
|
|
|
connection between two MPI processes has been exceeded. In general,
|
|
|
|
this should not happen because Open MPI uses flow control on per-peer
|
|
|
|
connections to ensure that receivers are always ready when data is
|
|
|
|
sent.
|
|
|
|
|
|
|
|
This error usually means one of two things:
|
|
|
|
|
2011-07-04 18:00:41 +04:00
|
|
|
1. There is something awry within the network fabric itself.
|
2008-06-02 15:03:48 +04:00
|
|
|
2. A bug in Open MPI has caused flow control to malfunction.
|
2011-07-04 18:00:41 +04:00
|
|
|
|
2008-06-02 15:03:48 +04:00
|
|
|
#1 is usually more likely. You should note the hosts on which this
|
|
|
|
error has occurred; it has been observed that rebooting or removing a
|
|
|
|
particular host from the job can sometimes resolve this issue.
|
|
|
|
|
|
|
|
Below is some information about the host that raised the error and the
|
|
|
|
peer to which it was connected:
|
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
Local device: %s
|
|
|
|
Peer host: %s
|
2008-06-02 15:03:48 +04:00
|
|
|
|
|
|
|
You may need to consult with your system administrator to get this
|
|
|
|
problem fixed.
|
|
|
|
#
|
|
|
|
[srq rnr retry exceeded]
|
|
|
|
The OpenFabrics "receiver not ready" retry count on a shared receive
|
|
|
|
queue or XRC receive queue has been exceeded. This error can occur if
|
|
|
|
the mca_btl_openib_ib_rnr_retry is set to a value less than 7 (where 7
|
|
|
|
the default value and effectively means "infinite retry"). If your
|
|
|
|
rnr_retry value is 7, there might be something awry within the network
|
|
|
|
fabric itself. In this case, you should note the hosts on which this
|
|
|
|
error has occurred; it has been observed that rebooting or removing a
|
|
|
|
particular host from the job can sometimes resolve this issue.
|
|
|
|
|
|
|
|
Below is some information about the host that raised the error and the
|
|
|
|
peer to which it was connected:
|
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
Local device: %s
|
|
|
|
Peer host: %s
|
2008-06-02 15:03:48 +04:00
|
|
|
|
|
|
|
You may need to consult with your system administrator to get this
|
|
|
|
problem fixed.
|
|
|
|
#
|
|
|
|
[pp retry exceeded]
|
2006-06-20 15:23:38 +04:00
|
|
|
The InfiniBand retry count between two MPI processes has been
|
|
|
|
exceeded. "Retry count" is defined in the InfiniBand spec 1.2
|
|
|
|
(section 12.7.38):
|
2006-06-06 01:24:42 +04:00
|
|
|
|
2006-06-20 15:23:38 +04:00
|
|
|
The total number of times that the sender wishes the receiver to
|
|
|
|
retry timeout, packet sequence, etc. errors before posting a
|
|
|
|
completion error.
|
2006-06-06 06:04:56 +04:00
|
|
|
|
2006-06-20 15:23:38 +04:00
|
|
|
This error typically means that there is something awry within the
|
|
|
|
InfiniBand fabric itself. You should note the hosts on which this
|
|
|
|
error has occurred; it has been observed that rebooting or removing a
|
2011-07-04 18:00:41 +04:00
|
|
|
particular host from the job can sometimes resolve this issue.
|
2006-06-06 06:04:56 +04:00
|
|
|
|
2006-06-20 15:23:38 +04:00
|
|
|
Two MCA parameters can be used to control Open MPI's behavior with
|
|
|
|
respect to the retry count:
|
2006-06-06 06:04:56 +04:00
|
|
|
|
2006-06-20 15:23:38 +04:00
|
|
|
* btl_openib_ib_retry_count - The number of times the sender will
|
|
|
|
attempt to retry (defaulted to 7, the maximum value).
|
|
|
|
* btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
|
2009-06-17 00:59:53 +04:00
|
|
|
to 20). The actual timeout value used is calculated as:
|
2006-06-20 15:23:38 +04:00
|
|
|
|
|
|
|
4.096 microseconds * (2^btl_openib_ib_timeout)
|
|
|
|
|
|
|
|
See the InfiniBand spec 1.2 (section 12.7.34) for more details.
|
2008-06-02 15:03:48 +04:00
|
|
|
|
|
|
|
Below is some information about the host that raised the error and the
|
|
|
|
peer to which it was connected:
|
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
Local device: %s
|
|
|
|
Peer host: %s
|
2008-06-02 15:03:48 +04:00
|
|
|
|
|
|
|
You may need to consult with your system administrator to get this
|
|
|
|
problem fixed.
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
2006-09-19 12:56:32 +04:00
|
|
|
[no active ports found]
|
2013-06-05 16:12:09 +04:00
|
|
|
WARNING: There is at least non-excluded one OpenFabrics device found,
|
|
|
|
but there are no active ports detected (or Open MPI was unable to use
|
|
|
|
them). This is most certainly not what you wanted. Check your
|
|
|
|
cables, subnet manager configuration, etc. The openib BTL will be
|
|
|
|
ignored for this job.
|
2009-04-01 21:52:16 +04:00
|
|
|
|
|
|
|
Local host: %s
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
2008-07-23 04:28:59 +04:00
|
|
|
[error in device init]
|
|
|
|
WARNING: There was an error initializing an OpenFabrics device.
|
2008-05-21 01:53:42 +04:00
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
Local device: %s
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
2008-10-06 04:46:02 +04:00
|
|
|
[no devices right type]
|
|
|
|
WARNING: No OpenFabrics devices of the right type were found within
|
|
|
|
the requested bus distance. The OpenFabrics BTL will be ignored for
|
|
|
|
this run.
|
|
|
|
|
|
|
|
Local host: %s
|
|
|
|
Requested type: %s
|
|
|
|
|
|
|
|
If the "requested type" is "<any>", this usually means that *no*
|
|
|
|
OpenFabrics devices were found within the requested bus distance.
|
|
|
|
#
|
2006-09-26 16:12:33 +04:00
|
|
|
[default subnet prefix]
|
2006-12-09 18:13:03 +03:00
|
|
|
WARNING: There are more than one active ports on host '%s', but the
|
|
|
|
default subnet GID prefix was detected on more than one of these
|
|
|
|
ports. If these ports are connected to different physical IB
|
|
|
|
networks, this configuration will fail in Open MPI. This version of
|
|
|
|
Open MPI requires that every physically separate IB subnet that is
|
|
|
|
used between connected MPI processes must have different subnet ID
|
|
|
|
values.
|
|
|
|
|
|
|
|
Please see this FAQ entry for more details:
|
|
|
|
|
|
|
|
http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid
|
2006-09-26 16:12:33 +04:00
|
|
|
|
|
|
|
NOTE: You can turn off this warning by setting the MCA parameter
|
|
|
|
btl_openib_warn_default_gid_prefix to 0.
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
2007-04-21 04:15:05 +04:00
|
|
|
[ibv_fork requested but not supported]
|
2007-11-28 10:18:59 +03:00
|
|
|
WARNING: fork() support was requested for the OpenFabrics (openib)
|
|
|
|
BTL, but it is not supported on the host %s. Deactivating the
|
|
|
|
OpenFabrics BTL.
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
2007-04-21 04:15:05 +04:00
|
|
|
[ibv_fork_init fail]
|
2007-11-28 10:18:59 +03:00
|
|
|
WARNING: fork() support was requested for the OpenFabrics (openib)
|
|
|
|
BTL, but the library call ibv_fork_init() failed on the host %s.
|
|
|
|
Deactivating the OpenFabrics BTL.
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
2007-04-21 04:15:05 +04:00
|
|
|
[wrong buffer alignment]
|
|
|
|
Wrong buffer alignment %d configured on host '%s'. Should be bigger
|
|
|
|
than zero and power of two. Use default %d instead.
|
2007-06-14 05:59:25 +04:00
|
|
|
#
|
2011-07-04 18:00:41 +04:00
|
|
|
[of error event]
|
2008-07-23 04:28:59 +04:00
|
|
|
The OpenFabrics stack has reported a network error event. Open MPI
|
|
|
|
will try to continue, but your job may end up failing.
|
This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
2007-07-18 05:15:59 +04:00
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
MPI process PID: %d
|
|
|
|
Error number: %d (%s)
|
This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
2007-07-18 05:15:59 +04:00
|
|
|
|
|
|
|
This error may indicate connectivity problems within the fabric;
|
|
|
|
please contact your system administrator.
|
|
|
|
#
|
|
|
|
[of unknown event]
|
|
|
|
The OpenFabrics stack has reported an unknown network error event.
|
|
|
|
Open MPI will try to continue, but the job may end up failing.
|
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
MPI process PID: %d
|
|
|
|
Error number: %d
|
This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
2007-07-18 05:15:59 +04:00
|
|
|
|
|
|
|
This error may indicate that you are using an OpenFabrics library
|
|
|
|
version that is not currently supported by Open MPI. You might try
|
|
|
|
recompiling Open MPI against your OpenFabrics library installation to
|
|
|
|
get more information.
|
|
|
|
#
|
2007-06-14 05:59:25 +04:00
|
|
|
[specified include and exclude]
|
2008-12-03 01:42:01 +03:00
|
|
|
ERROR: You have specified more than one of the btl_openib_if_include,
|
|
|
|
btl_openib_if_exclude, btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude
|
|
|
|
MCA parameters. These four parameters are mutually exclusive; you can only
|
|
|
|
specify one.
|
2007-06-14 05:59:25 +04:00
|
|
|
|
|
|
|
For reference, the values that you specified are:
|
|
|
|
|
|
|
|
btl_openib_if_include: %s
|
|
|
|
btl_openib_if_exclude: %s
|
2008-12-03 01:42:01 +03:00
|
|
|
btl_openib_ipaddr_include: %s
|
|
|
|
btl_openib_ipaddr_exclude: %s
|
This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
2007-07-18 05:15:59 +04:00
|
|
|
#
|
2007-06-14 05:59:25 +04:00
|
|
|
[nonexistent port]
|
2008-07-23 04:28:59 +04:00
|
|
|
WARNING: One or more nonexistent OpenFabrics devices/ports were
|
|
|
|
specified:
|
2007-06-14 05:59:25 +04:00
|
|
|
|
|
|
|
Host: %s
|
|
|
|
MCA parameter: mca_btl_if_%sclude
|
|
|
|
Nonexistent entities: %s
|
|
|
|
|
|
|
|
These entities will be ignored. You can disable this warning by
|
|
|
|
setting the btl_openib_warn_nonexistent_if MCA parameter to 0.
|
This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
2007-07-18 05:15:59 +04:00
|
|
|
#
|
|
|
|
[invalid mca param value]
|
2007-07-25 00:57:40 +04:00
|
|
|
WARNING: An invalid MCA parameter value was found for the OpenFabrics
|
|
|
|
(openib) BTL.
|
This commit brings in two major things:
1. Galen's fine-grain control of queue pair resources in the openib
BTL.
1. Pasha's new implementation of asychronous HCA event handling.
Pasha's new implementation doesn't take much explanation, but the new
"multifrag" stuff does.
Note that "svn merge" was not used to bring this new code from the
/tmp/ib_multifrag branch -- something Bad happened in the periodic
trunk pulls on that branch making an actual merge back to the trunk
effectively impossible (i.e., lots and lots of arbitrary conflicts and
artifical changes). :-(
== Fine-grain control of queue pair resources ==
Galen's fine-grain control of queue pair resources to the OpenIB BTL
(thanks to Gleb for fixing broken code and providing additional
functionality, Pasha for finding broken code, and Jeff for doing all
the svn work and regression testing).
Prior to this commit, the OpenIB BTL created two queue pairs: one for
eager size fragments and one for max send size fragments. When the
use of the shared receive queue (SRQ) was specified (via "-mca
btl_openib_use_srq 1"), these QPs would use a shared receive queue for
receive buffers instead of the default per-peer (PP) receive queues
and buffers. One consequence of this design is that receive buffer
utilization (the size of the data received as a percentage of the
receive buffer used for the data) was quite poor for a number of
applications.
The new design allows multiple QPs to be specified at runtime. Each
QP can be setup to use PP or SRQ receive buffers as well as giving
fine-grained control over receive buffer size, number of receive
buffers to post, when to replenish the receive queue (low water mark)
and for SRQ QPs, the number of outstanding sends can also be
specified. The following is an example of the syntax to describe QPs
to the OpenIB BTL using the new MCA parameter btl_openib_receive_queues:
{{{
-mca btl_openib_receive_queues \
"P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32"
}}}
Each QP description is delimited by ";" (semicolon) with individual
fields of the QP description delimited by "," (comma). The above
example therefore describes 4 QPs.
The first QP is:
P,128,16,4
Meaning: per-peer receive buffer QPs are indicated by a starting field
of "P"; the first QP (shown above) is therefore a per-peer based QP.
The second field indicates the size of the receive buffer in bytes
(128 bytes). The third field indicates the number of receive buffers
to allocate to the QP (16). The fourth field indicates the low
watermark for receive buffers at which time the BTL will repost
receive buffers to the QP (4).
The second QP is:
S,1024,256,128,32
Shared receive queue based QPs are indicated by a starting field of
"S"; the second QP (shown above) is therefore a shared receive queue
based QP. The second, third and fourth fields are the same as in the
per-peer based QP. The fifth field is the number of outstanding sends
that are allowed at a given time on the QP (32). This provides a
"good enough" mechanism of flow control for some regular communication
patterns.
QPs MUST be specified in ascending receive buffer size order. This
requirement may be removed prior to 1.3 release.
This commit was SVN r15474.
2007-07-18 05:15:59 +04:00
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Problem: %s
|
|
|
|
Resolution: %s
|
2007-08-14 18:46:13 +04:00
|
|
|
#
|
|
|
|
[no qps in receive_queues]
|
|
|
|
WARNING: No queue pairs were defined in the btl_openib_receive_queues
|
2007-11-28 10:18:59 +03:00
|
|
|
MCA parameter. At least one queue pair must be defined. The
|
|
|
|
OpenFabrics (openib) BTL will therefore be deactivated for this run.
|
2007-08-14 18:46:13 +04:00
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
2007-08-14 18:46:13 +04:00
|
|
|
#
|
|
|
|
[invalid qp type in receive_queues]
|
|
|
|
WARNING: An invalid queue pair type was specified in the
|
2007-11-28 10:18:59 +03:00
|
|
|
btl_openib_receive_queues MCA parameter. The OpenFabrics (openib) BTL
|
|
|
|
will be deactivated for this run.
|
2007-08-14 18:46:13 +04:00
|
|
|
|
|
|
|
Valid queue pair types are "P" for per-peer and "S" for shared receive
|
|
|
|
queue.
|
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
btl_openib_receive_queues: %s
|
|
|
|
Bad specification: %s
|
2007-08-14 18:46:13 +04:00
|
|
|
#
|
|
|
|
[invalid pp qp specification]
|
|
|
|
WARNING: An invalid per-peer receive queue specification was detected
|
2007-11-28 10:18:59 +03:00
|
|
|
as part of the btl_openib_receive_queues MCA parameter. The
|
|
|
|
OpenFabrics (openib) BTL will therefore be deactivated for this run.
|
2007-08-14 18:46:13 +04:00
|
|
|
|
2009-01-26 21:57:07 +03:00
|
|
|
Per-peer receive queues require between 2 and 5 parameters:
|
2007-08-14 18:46:13 +04:00
|
|
|
|
|
|
|
1. Buffer size in bytes (mandatory)
|
2009-01-26 21:57:07 +03:00
|
|
|
2. Number of buffers (mandatory)
|
2007-08-14 18:46:13 +04:00
|
|
|
3. Low buffer count watermark (optional; defaults to (num_buffers / 2))
|
|
|
|
4. Credit window size (optional; defaults to (low_watermark / 2))
|
|
|
|
5. Number of buffers reserved for credit messages (optional;
|
|
|
|
defaults to (num_buffers*2-1)/credit_window)
|
|
|
|
|
|
|
|
Example: P,128,256,128,16
|
|
|
|
- 128 byte buffers
|
|
|
|
- 256 buffers to receive incoming MPI messages
|
|
|
|
- When the number of available buffers reaches 128, re-post 128 more
|
|
|
|
buffers to reach a total of 256
|
|
|
|
- If the number of available credits reaches 16, send an explicit
|
|
|
|
credit message to the sender
|
|
|
|
- Defaulting to ((256 * 2) - 1) / 16 = 31; this many buffers are
|
|
|
|
reserved for explicit credit messages
|
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
Bad queue specification: %s
|
2007-08-14 18:46:13 +04:00
|
|
|
#
|
|
|
|
[invalid srq specification]
|
|
|
|
WARNING: An invalid shared receive queue specification was detected as
|
2007-11-28 10:18:59 +03:00
|
|
|
part of the btl_openib_receive_queues MCA parameter. The OpenFabrics
|
|
|
|
(openib) BTL will therefore be deactivated for this run.
|
2007-08-14 18:46:13 +04:00
|
|
|
|
2009-12-15 18:52:10 +03:00
|
|
|
Shared receive queues can take between 2 and 6 parameters:
|
2007-08-14 18:46:13 +04:00
|
|
|
|
|
|
|
1. Buffer size in bytes (mandatory)
|
2009-01-26 21:57:07 +03:00
|
|
|
2. Number of buffers (mandatory)
|
2007-08-14 18:46:13 +04:00
|
|
|
3. Low buffer count watermark (optional; defaults to (num_buffers / 2))
|
|
|
|
4. Maximum number of outstanding sends a sender can have (optional;
|
|
|
|
defaults to (low_watermark / 4)
|
2009-12-15 18:52:10 +03:00
|
|
|
5. Start value of number of receive buffers that will be pre-posted (optional; defaults to (num_buffers / 4))
|
|
|
|
6. Event limit buffer count watermark (optional; defaults to (3/16 of start value of buffers number))
|
2007-08-14 18:46:13 +04:00
|
|
|
|
2009-12-15 18:52:10 +03:00
|
|
|
Example: S,1024,256,128,32,32,8
|
2007-08-14 18:46:13 +04:00
|
|
|
- 1024 byte buffers
|
|
|
|
- 256 buffers to receive incoming MPI messages
|
|
|
|
- When the number of available buffers reaches 128, re-post 128 more
|
|
|
|
buffers to reach a total of 256
|
|
|
|
- A sender will not send to a peer unless it has less than 32
|
|
|
|
outstanding sends to that peer.
|
2009-12-15 18:52:10 +03:00
|
|
|
- 32 receive buffers will be preposted.
|
2009-12-16 13:23:58 +03:00
|
|
|
- When the number of unused shared receive buffers reaches 8, more
|
|
|
|
buffers (32 in this case) will be posted.
|
2007-08-14 18:46:13 +04:00
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
Bad queue specification: %s
|
2007-08-14 18:46:13 +04:00
|
|
|
#
|
|
|
|
[rd_num must be > rd_low]
|
|
|
|
WARNING: The number of buffers for a queue pair specified via the
|
|
|
|
btl_openib_receive_queues MCA parameter must be greater than the low
|
2007-11-28 10:18:59 +03:00
|
|
|
buffer count watermark. The OpenFabrics (openib) BTL will therefore
|
|
|
|
be deactivated for this run.
|
2007-08-14 18:46:13 +04:00
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
Bad queue specification: %s
|
2007-08-14 18:46:13 +04:00
|
|
|
#
|
2009-12-15 18:52:10 +03:00
|
|
|
[rd_num must be >= rd_init]
|
|
|
|
WARNING: The number of buffers for a queue pair specified via the
|
|
|
|
btl_openib_receive_queues MCA parameter (parameter #2) must be
|
|
|
|
greater or equal to the initial SRQ size (parameter #5).
|
|
|
|
The OpenFabrics (openib) BTL will therefore be deactivated for this run.
|
|
|
|
|
|
|
|
Local host: %s
|
|
|
|
Bad queue specification: %s
|
|
|
|
#
|
|
|
|
[srq_limit must be > rd_num]
|
|
|
|
WARNING: The number of buffers for a queue pair specified via the
|
|
|
|
btl_openib_receive_queues MCA parameter (parameter #2) must be greater than the limit
|
|
|
|
buffer count (parameter #6). The OpenFabrics (openib) BTL will therefore
|
|
|
|
be deactivated for this run.
|
|
|
|
|
|
|
|
Local host: %s
|
|
|
|
Bad queue specification: %s
|
|
|
|
#
|
2007-08-14 18:46:13 +04:00
|
|
|
[biggest qp size is too small]
|
|
|
|
WARNING: The largest queue pair buffer size specified in the
|
|
|
|
btl_openib_receive_queues MCA parameter is smaller than the maximum
|
|
|
|
send size (i.e., the btl_openib_max_send_size MCA parameter), meaning
|
|
|
|
that no queue is large enough to receive the largest possible incoming
|
2007-11-28 10:18:59 +03:00
|
|
|
message fragment. The OpenFabrics (openib) BTL will therefore be
|
|
|
|
deactivated for this run.
|
2007-08-14 18:46:13 +04:00
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
Largest buffer size: %d
|
|
|
|
Maximum send fragment size: %d
|
2007-08-14 18:46:13 +04:00
|
|
|
#
|
|
|
|
[biggest qp size is too big]
|
|
|
|
WARNING: The largest queue pair buffer size specified in the
|
|
|
|
btl_openib_receive_queues MCA parameter is larger than the maximum
|
|
|
|
send size (i.e., the btl_openib_max_send_size MCA parameter). This
|
|
|
|
means that memory will be wasted because the largest possible incoming
|
|
|
|
message fragment will not fill a buffer allocated for incoming
|
|
|
|
fragments.
|
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
Largest buffer size: %d
|
|
|
|
Maximum send fragment size: %d
|
2007-08-14 18:46:13 +04:00
|
|
|
#
|
2007-09-15 01:42:56 +04:00
|
|
|
[freelist too small]
|
|
|
|
WARNING: The maximum freelist size that was specified was too small
|
|
|
|
for the requested receive queue sizes. The maximum freelist size must
|
|
|
|
be at least equal to the sum of the largest number of buffers posted
|
|
|
|
to a single queue plus the corresponding number of reserved/credit
|
|
|
|
buffers for that queue. It is suggested that the maximum be quite a
|
|
|
|
bit larger than this for performance reasons.
|
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
Specified freelist size: %d
|
|
|
|
Minimum required freelist size: %d
|
2007-11-28 10:18:59 +03:00
|
|
|
#
|
|
|
|
[XRC with PP or SRQ]
|
|
|
|
WARNING: An invalid queue pair type was specified in the
|
|
|
|
btl_openib_receive_queues MCA parameter. The OpenFabrics (openib) BTL
|
|
|
|
will be deactivated for this run.
|
|
|
|
|
|
|
|
Note that XRC ("X") queue pairs cannot be used with per-peer ("P") and
|
|
|
|
SRQ ("S") queue pairs. This restriction may be removed in future
|
|
|
|
versions of Open MPI.
|
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
btl_openib_receive_queues: %s
|
2007-11-28 10:18:59 +03:00
|
|
|
#
|
|
|
|
[XRC with BTLs per LID]
|
|
|
|
WARNING: An invalid queue pair type was specified in the
|
|
|
|
btl_openib_receive_queues MCA parameter. The OpenFabrics (openib) BTL
|
|
|
|
will be deactivated for this run.
|
|
|
|
|
|
|
|
XRC ("X") queue pairs can not be used when (btls_per_lid > 1). This
|
|
|
|
restriction may be removed in future versions of Open MPI.
|
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
btl_openib_receive_queues: %s
|
|
|
|
btls_per_lid: %d
|
2007-11-28 10:18:59 +03:00
|
|
|
#
|
|
|
|
[XRC on device without XRC support]
|
|
|
|
WARNING: You configured the OpenFabrics (openib) BTL to run with %d
|
|
|
|
XRC queues. The device %s does not have XRC capabilities; the
|
|
|
|
OpenFabrics btl will ignore this device. If no devices are found with
|
|
|
|
XRC capabilities, the OpenFabrics BTL will be disabled.
|
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
2007-11-28 10:18:59 +03:00
|
|
|
#
|
|
|
|
[No XRC support]
|
|
|
|
WARNING: The Open MPI build was compiled without XRC support, but XRC
|
|
|
|
("X") queues were specified in the btl_openib_receive_queues MCA
|
|
|
|
parameter. The OpenFabrics (openib) BTL will therefore be deactivated
|
|
|
|
for this run.
|
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
Local host: %s
|
|
|
|
btl_openib_receive_queues: %s
|
2007-11-28 10:18:59 +03:00
|
|
|
#
|
2007-12-02 17:43:28 +03:00
|
|
|
[non optimal rd_win]
|
|
|
|
WARNING: rd_win specification is non optimal. For maximum performance it is
|
2008-07-28 18:30:57 +04:00
|
|
|
advisable to configure rd_win bigger than (rd_num - rd_low), but currently
|
2007-12-02 17:43:28 +03:00
|
|
|
rd_win = %d and (rd_num - rd_low) = %d.
|
2008-01-28 13:38:08 +03:00
|
|
|
#
|
|
|
|
[apm without lmc]
|
|
|
|
WARNING: You can't enable APM support with LMC bit configured to 0.
|
|
|
|
APM support will be disabled.
|
|
|
|
#
|
|
|
|
[apm with wrong lmc]
|
|
|
|
Can not provide %d alternative paths with LMC bit configured to %d.
|
2008-02-20 16:44:05 +03:00
|
|
|
#
|
|
|
|
[apm not enough ports]
|
2008-07-23 04:28:59 +04:00
|
|
|
WARNING: For APM over ports ompi require at least 2 active ports and
|
|
|
|
only single active port was found. Disabling APM over ports
|
2008-05-21 01:53:42 +04:00
|
|
|
#
|
2010-02-10 19:53:26 +03:00
|
|
|
[locally conflicting receive_queues]
|
|
|
|
Open MPI detected two devices on a single server that have different
|
|
|
|
"receive_queues" parameter values (in the openib BTL). Open MPI
|
|
|
|
currently only supports one OpenFabrics receive_queues value in an MPI
|
|
|
|
job, even if you have different types of OpenFabrics adapters on the
|
|
|
|
same host.
|
2008-05-21 01:53:42 +04:00
|
|
|
|
2010-02-10 19:53:26 +03:00
|
|
|
Device 2 (in the details shown below) will be ignored for the duration
|
|
|
|
of this MPI job.
|
|
|
|
|
|
|
|
You can fix this issue by one or more of the following:
|
|
|
|
|
|
|
|
1. Set the MCA parameter btl_openib_receive_queues to a value that
|
|
|
|
is usable by all the OpenFabrics devices that you will use.
|
|
|
|
2. Use the btl_openib_if_include or btl_openib_if_exclue MCA
|
|
|
|
parameters to select exactly which OpenFabrics devices to use in
|
|
|
|
your MPI job.
|
2008-05-21 01:53:42 +04:00
|
|
|
|
2010-02-10 19:53:26 +03:00
|
|
|
Finally, note that the "receive_queues" values may have been set by
|
|
|
|
the Open MPI device default settings file. You may want to look in
|
|
|
|
this file and see if your devices are getting receive_queues values
|
|
|
|
from this file:
|
2008-05-21 01:53:42 +04:00
|
|
|
|
2008-07-23 04:28:59 +04:00
|
|
|
%s/mca-btl-openib-device-params.ini
|
2010-02-10 19:53:26 +03:00
|
|
|
|
|
|
|
Here is more detailed information about the recieive_queus value
|
|
|
|
conflict:
|
|
|
|
|
|
|
|
Local host: %s
|
|
|
|
Device 1: %s (vendor 0x%x, part ID %d)
|
|
|
|
Receive queues: %s
|
|
|
|
Device 2: %s (vendor 0x%x, part ID %d)
|
|
|
|
Receive queues: %s
|
2008-05-28 15:31:38 +04:00
|
|
|
#
|
2008-06-24 22:31:46 +04:00
|
|
|
[eager RDMA and progress threads]
|
|
|
|
WARNING: The openib BTL was directed to use "eager RDMA" for short
|
|
|
|
messages, but the openib BTL was compiled with progress threads
|
|
|
|
support. Short eager RDMA is not yet supported with progress threads;
|
2011-07-04 18:00:41 +04:00
|
|
|
its use has been disabled in this job.
|
2008-06-24 22:31:46 +04:00
|
|
|
|
|
|
|
This is a warning only; you job will attempt to continue.
|
2008-09-27 15:19:21 +04:00
|
|
|
#
|
|
|
|
[ptmalloc2 with no threads]
|
|
|
|
WARNING: It appears that ptmalloc2 was compiled into this process via
|
|
|
|
-lopenmpi-malloc, but there is no thread support. This combination is
|
|
|
|
known to cause memory corruption in the openib BTL. Open MPI is
|
|
|
|
therefore disabling the use of the openib BTL in this process for this
|
|
|
|
run.
|
|
|
|
|
|
|
|
Local host: %s
|
2008-10-06 04:46:02 +04:00
|
|
|
#
|
|
|
|
[cannot raise btl error]
|
|
|
|
The OpenFabrics driver in Open MPI tried to raise a fatal error, but
|
|
|
|
failed. Hopefully there was an error message before this one that
|
|
|
|
gave some more detailed information.
|
|
|
|
|
|
|
|
Local host: %s
|
|
|
|
Source file: %s
|
|
|
|
Source line: %d
|
|
|
|
|
|
|
|
Your job is now going to abort, sorry.
|
|
|
|
#
|
|
|
|
[no iwarp support]
|
|
|
|
Open MPI does not support iWARP devices with this version of OFED.
|
|
|
|
You need to upgrade to a later version of OFED (1.3 or later) for Open
|
|
|
|
MPI to support iWARP devices.
|
|
|
|
|
|
|
|
(This message is being displayed because you told Open MPI to use
|
|
|
|
iWARP devices via the btl_openib_device_type MCA parameter)
|
2009-05-09 16:28:09 +04:00
|
|
|
#
|
|
|
|
[invalid ipaddr_inexclude]
|
|
|
|
WARNING: An invalid value was given for btl_openib_ipaddr_%s. This
|
|
|
|
value will be ignored.
|
|
|
|
|
|
|
|
Local host: %s
|
|
|
|
Value: %s
|
|
|
|
Message: %s
|
2009-12-15 17:25:07 +03:00
|
|
|
#
|
|
|
|
[unsupported queues configuration]
|
2010-01-14 16:31:11 +03:00
|
|
|
The Open MPI receive queue configuration for the OpenFabrics devices
|
|
|
|
on two nodes are incompatible, meaning that MPI processes on two
|
|
|
|
specific nodes were unable to communicate with each other. This
|
|
|
|
generally happens when you are using OpenFabrics devices from
|
|
|
|
different vendors on the same network. You should be able to use the
|
|
|
|
mca_btl_openib_receive_queues MCA parameter to set a uniform receive
|
|
|
|
queue configuration for all the devices in the MPI job, and therefore
|
|
|
|
be able to run successfully.
|
2009-12-15 17:25:07 +03:00
|
|
|
|
|
|
|
Local host: %s
|
|
|
|
Local adapter: %s (vendor 0x%x, part ID %d)
|
|
|
|
Local queues: %s
|
2011-07-04 18:00:41 +04:00
|
|
|
|
2009-12-15 17:25:07 +03:00
|
|
|
Remote host: %s
|
|
|
|
Remote adapter: (vendor 0x%x, part ID %d)
|
|
|
|
Remote queues: %s
|
|
|
|
#
|
|
|
|
[conflicting transport types]
|
|
|
|
Open MPI detected two different OpenFabrics transport types in the same Infiniband network.
|
|
|
|
Such mixed network trasport configuration is not supported by Open MPI.
|
|
|
|
|
|
|
|
Local host: %s
|
|
|
|
Local adapter: %s (vendor 0x%x, part ID %d)
|
|
|
|
Local transport type: %s
|
2011-07-04 18:00:41 +04:00
|
|
|
|
2009-12-15 17:25:07 +03:00
|
|
|
Remote host: %s
|
|
|
|
Remote Adapter: (vendor 0x%x, part ID %d)
|
|
|
|
Remote transport type: %s
|
2011-02-24 17:09:22 +03:00
|
|
|
#
|
|
|
|
[gid index too large]
|
|
|
|
Open MPI tried to use a GID index that was too large for an
|
|
|
|
OpenFabrics device (i.e., the GID index does not exist on this
|
|
|
|
device).
|
|
|
|
|
|
|
|
Local host: %s
|
|
|
|
Local adapter: %s
|
|
|
|
Local port: %d
|
|
|
|
|
|
|
|
Requested GID index: %d (specified by the btl_openib_gid_index MCA param)
|
|
|
|
Max allowable GID index: %d
|
|
|
|
|
|
|
|
Use "ibv_devinfo -v" on the local host to see the GID table of this
|
|
|
|
device.
|
2012-07-19 21:52:21 +04:00
|
|
|
[reg mem limit low]
|
|
|
|
WARNING: It appears that your OpenFabrics subsystem is configured to only
|
|
|
|
allow registering part of your physical memory. This can cause MPI jobs to
|
|
|
|
run with erratic performance, hang, and/or crash.
|
|
|
|
|
|
|
|
This may be caused by your OpenFabrics vendor limiting the amount of
|
|
|
|
physical memory that can be registered. You should investigate the
|
|
|
|
relevant Linux kernel module parameters that control how much physical
|
|
|
|
memory can be registered, and increase them to allow registering all
|
|
|
|
physical memory on your machine.
|
|
|
|
|
|
|
|
See this Open MPI FAQ item for more information on these Linux kernel module
|
|
|
|
parameters:
|
|
|
|
|
|
|
|
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
|
|
|
|
|
|
|
|
Local host: %s
|
|
|
|
Registerable memory: %lu MiB
|
|
|
|
Total memory: %lu MiB
|
2012-08-25 15:39:06 +04:00
|
|
|
|
|
|
|
%s
|
2013-12-10 20:04:08 +04:00
|
|
|
[CUDA_no_gdr_support]
|
|
|
|
You requested to run with CUDA GPU Direct RDMA support but the Open MPI
|
|
|
|
library was not built with that support. The Open MPI library must be
|
|
|
|
configured with CUDA 6.0 or later.
|
|
|
|
|
|
|
|
Local host: %s
|
|
|
|
[driver_no_gdr_support]
|
|
|
|
You requested to run with CUDA GPU Direct RDMA support but this OFED
|
|
|
|
installation does not have that support. Contact Mellanox to figure
|
|
|
|
out how to get an OFED stack with that support.
|
|
|
|
|
|
|
|
Local host: %s
|
2013-12-10 22:08:53 +04:00
|
|
|
[no_fork_with_gdr]
|
2014-07-31 19:34:38 +04:00
|
|
|
You cannot have fork support and CUDA GPU Direct RDMA support on at the
|
|
|
|
same time. Please disable one of them. Deactivating the openib BTL.
|
2013-12-10 22:08:53 +04:00
|
|
|
|
|
|
|
Local host: %s
|