1
1
openmpi/orte/runtime/help-orte-runtime.txt
Ralph Castain 13665bffe8 Per an off-list discussion, it appears possible for a system to report failure when executing getpwuid. There are several reasons for this error to occur, most notably if the system uses a network-based authentication protocol (e.g., NIS) and that sytem gets overwhelmed when we launch on a lot of nodes.
There is no good way to recover from this scenario, and from past experience, using the user's name in the session directory (as opposed to the uid) is very helpful when things go wrong. So print a help message when this happens (it is extremely rare, but has happened at least once now) and return an error.

cmr:v1.7.3,reviewer=jsquyres
cmr:v1.6.5,reviewer=jsquyres

This commit was SVN r28658.
2013-06-20 04:30:42 +00:00

60 строки
2.3 KiB
Plaintext

# -*- text -*-
#
# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
# University Research and Technology
# Corporation. All rights reserved.
# Copyright (c) 2004-2005 The University of Tennessee and The University
# of Tennessee Research Foundation. All rights
# reserved.
# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
# University of Stuttgart. All rights reserved.
# Copyright (c) 2004-2005 The Regents of the University of California.
# All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
# This is the US/English general help file for Open MPI.
#
[orte_init:startup:internal-failure]
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
%s failed
--> Returned value %s (%d) instead of ORTE_SUCCESS
#
#
[orte:session:dir:prohibited]
The specified location for the temporary directories required by Open MPI
is on the list of prohibited locations:
Location given: %s
Prohibited locations: %s
If you believe this is in error, please contact your system administrator
to have the list of prohibited locations changed. Otherwise, please identify
a different location to be used (use -h to see the cmd line option), or
simply let the system pick a default location.
#
[orte:session:dir:nopwname]
Open MPI was unable to obtain the username in order to
create a path for its required temporary directories. This
is usually caused by either the UID being removed from the
passed file, or from use of network-based authentication
service (e.g., NIS) on a large cluster that might suffer
from congestion.
Please consult your system administrator about these
conditions and try again.
#
[orte_nidmap:too_many_nodes]
An error occurred while trying to pack the information about the job. More nodes
have been found than the %d expected. Please check your configuration files such
as the mapping.