1
1
openmpi/ompi/mca/btl/openib/help-mpi-btl-openib.txt
Jeff Squyres df45221a3e Until a real fix for #142 is found, this workaround prohibits using
mpi_leave_pinned when multiple OpenIB HCA ports are found.
Specifically, if mpi_leave_pinned == 1 and ultiple HCA ports are
found, the MCA parameter btl_openib_max_btls is set to 1.  If the MCA
parameter btl_openib_warn_leave_pinned_multi_port is true, emit a
warning that this happened (having an MCA parameter to control the
warning allows users/sysadmins to turn it off instead of being nagged
for every run).

This commit was SVN r10521.
2006-06-27 10:43:03 +00:00

47 строки
1.8 KiB
Plaintext

# -*- text -*-
#
# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
# University Research and Technology
# Corporation. All rights reserved.
# Copyright (c) 2004-2005 The University of Tennessee and The University
# of Tennessee Research Foundation. All rights
# reserved.
# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
# University of Stuttgart. All rights reserved.
# Copyright (c) 2004-2006 The Regents of the University of California.
# All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
# This is the US/English general help file for Open MPI.
#
[btl_openib:retry-exceeded]
The InfiniBand retry count between two MPI processes has been
exceeded. "Retry count" is defined in the InfiniBand spec 1.2
(section 12.7.38):
The total number of times that the sender wishes the receiver to
retry timeout, packet sequence, etc. errors before posting a
completion error.
This error typically means that there is something awry within the
InfiniBand fabric itself. You should note the hosts on which this
error has occurred; it has been observed that rebooting or removing a
particular host from the job can sometimes resolve this issue.
Two MCA parameters can be used to control Open MPI's behavior with
respect to the retry count:
* btl_openib_ib_retry_count - The number of times the sender will
attempt to retry (defaulted to 7, the maximum value).
* btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
to 10). The actual timeout value used is calculated as:
4.096 microseconds * (2^btl_openib_ib_timeout)
See the InfiniBand spec 1.2 (section 12.7.34) for more details.