1953e3406f
We commonly see messages on the users list where a peer has hung up because it has crashed. Instead of having just a BTL_ERROR message, make this a real opal_show_help() message that tells the user that the peer unexpectedly hung up, and they should look into *why* that peer hung up. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
94 строки
2.6 KiB
Plaintext
94 строки
2.6 KiB
Plaintext
# -*- text -*-
|
|
#
|
|
# Copyright (c) 2009-2016 Cisco Systems, Inc. All rights reserved.
|
|
# Copyright (c) 2015-2016 The University of Tennessee and The University
|
|
# of Tennessee Research Foundation. All rights
|
|
# reserved.
|
|
# $COPYRIGHT$
|
|
#
|
|
# Additional copyrights may follow
|
|
#
|
|
# $HEADER$
|
|
#
|
|
# This is the US/English help file for Open MPI's TCP support
|
|
# (the openib BTL).
|
|
#
|
|
[invalid if_inexclude]
|
|
WARNING: An invalid value was given for btl_tcp_if_%s. This
|
|
value will be ignored.
|
|
|
|
Local host: %s
|
|
Value: %s
|
|
Message: %s
|
|
#
|
|
[invalid minimum port]
|
|
WARNING: An invalid value was given for the btl_tcp_port_min_%s. Legal
|
|
values are in the range [1 .. 2^16-1]. This value will be ignored
|
|
(reset to the default value of 1024).
|
|
|
|
Local host: %s
|
|
Value: %d
|
|
#
|
|
[client connect fail]
|
|
WARNING: Open MPI failed to TCP connect to a peer MPI process. This
|
|
should not happen.
|
|
|
|
Your Open MPI job may now fail.
|
|
|
|
Local host: %s
|
|
PID: %d
|
|
Message: %s
|
|
Error: %s (%d)
|
|
#
|
|
[client handshake fail]
|
|
WARNING: Open MPI failed to handshake with a connecting peer MPI
|
|
process over TCP. This should not happen.
|
|
|
|
Your Open MPI job may now fail.
|
|
|
|
Local host: %s
|
|
PID: %d
|
|
Message: %s
|
|
#
|
|
[accept failed]
|
|
WARNING: The accept(3) system call failed on a TCP socket. While this
|
|
should generally never happen on a well-configured HPC system, the
|
|
most common causes when it does occur are:
|
|
|
|
* The process ran out of file descriptors
|
|
* The operating system ran out of file descriptors
|
|
* The operating system ran out of memory
|
|
|
|
Your Open MPI job will likely hang (or crash) until the failure
|
|
resason is fixed (e.g., more file descriptors and/or memory becomes
|
|
available), and may eventually timeout / abort.
|
|
|
|
Local host: %s
|
|
PID: %d
|
|
Errno: %d (%s)
|
|
#
|
|
[unsuported progress thread]
|
|
WARNING: Support for the TCP progress thread has not been compiled in.
|
|
Fall back to the normal progress.
|
|
|
|
Local host: %s
|
|
Value: %s
|
|
Message: %s
|
|
#
|
|
[peer hung up]
|
|
An MPI communication peer process has unexpectedly disconnected. This
|
|
usually indicates a failure in the peer process (e.g., a crash or
|
|
otherwise exiting without calling MPI_FINALIZE first).
|
|
|
|
Although this local MPI process will likely now behave unpredictably
|
|
(it may even hang or crash), the root cause of this problem is the
|
|
failure of the peer -- that is what you need to investigate. For
|
|
example, there may be a core file that you can examine. More
|
|
generally: such peer hangups are frequently caused by application bugs
|
|
or other external events.
|
|
|
|
Local host: %s
|
|
Local PID: %d
|
|
Peer host: %s
|
|
#
|