1d27ca5d0a
mpi_leave_pinned when multiple OpenIB HCA ports are found. Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are found, the MCA parameter btl_openib_max_btls is set to 1. If the MCA parameter btl_openib_warn_leave_pinned_multi_port is true, emit a warning that this happened (having an MCA parameter to control the warning allows users/sysadmins to turn it off instead of being nagged for every run). This commit was SVN r10424.
59 строки
2.4 KiB
Plaintext
59 строки
2.4 KiB
Plaintext
# -*- text -*-
|
|
#
|
|
# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
|
|
# University Research and Technology
|
|
# Corporation. All rights reserved.
|
|
# Copyright (c) 2004-2005 The University of Tennessee and The University
|
|
# of Tennessee Research Foundation. All rights
|
|
# reserved.
|
|
# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
|
# University of Stuttgart. All rights reserved.
|
|
# Copyright (c) 2004-2006 The Regents of the University of California.
|
|
# All rights reserved.
|
|
# $COPYRIGHT$
|
|
#
|
|
# Additional copyrights may follow
|
|
#
|
|
# $HEADER$
|
|
#
|
|
# This is the US/English general help file for Open MPI.
|
|
#
|
|
[btl_openib:retry-exceeded]
|
|
The InfiniBand retry count between two MPI processes has been
|
|
exceeded. "Retry count" is defined in the InfiniBand spec 1.2
|
|
(section 12.7.38):
|
|
|
|
The total number of times that the sender wishes the receiver to
|
|
retry timeout, packet sequence, etc. errors before posting a
|
|
completion error.
|
|
|
|
This error typically means that there is something awry within the
|
|
InfiniBand fabric itself. You should note the hosts on which this
|
|
error has occurred; it has been observed that rebooting or removing a
|
|
particular host from the job can sometimes resolve this issue.
|
|
|
|
Two MCA parameters can be used to control Open MPI's behavior with
|
|
respect to the retry count:
|
|
|
|
* btl_openib_ib_retry_count - The number of times the sender will
|
|
attempt to retry (defaulted to 7, the maximum value).
|
|
|
|
* btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
|
|
to 10). The actual timeout value used is calculated as:
|
|
|
|
4.096 microseconds * (2^btl_openib_ib_timeout)
|
|
|
|
See the InfiniBand spec 1.2 (section 12.7.34) for more details.
|
|
[btl_openib:leave_pinned_multi_port]
|
|
# Until ticket #142 is fixed
|
|
This release of Open MPI does not support setting the
|
|
"mpi_leave_pinned" parameter to a true value when using multiple HCA
|
|
ports. This warning is emitted when multiple HCA ports are detected
|
|
and "mpi_leave_pinned" is set to a true value, and is to inform you
|
|
that Open MPI is going to automatically disregard all HCA ports beyond
|
|
the first one (i.e., the MCA parameter "btl_openib_max_btls" parameter
|
|
has been overridden and set to 1).
|
|
|
|
You may silence this warning by setting the
|
|
"btl_openib_warn_leave_pinned_multi_port" MCA parameter to 0.
|