1
1
openmpi/orte/runtime/help-orte-runtime.txt
Ralph Castain 45e695928f As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time:
* add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit.

* remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL"

* modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded

* removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base

* added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames

This commit was SVN r29052.
2013-08-20 18:59:36 +00:00

64 строки
2.5 KiB
Plaintext

# -*- text -*-
#
# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
# University Research and Technology
# Corporation. All rights reserved.
# Copyright (c) 2004-2005 The University of Tennessee and The University
# of Tennessee Research Foundation. All rights
# reserved.
# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
# University of Stuttgart. All rights reserved.
# Copyright (c) 2004-2005 The Regents of the University of California.
# All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
# This is the US/English general help file for Open MPI.
#
[orte_init:startup:internal-failure]
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
%s failed
--> Returned value %s (%d) instead of ORTE_SUCCESS
#
#
[orte:session:dir:prohibited]
The specified location for the temporary directories required by Open MPI
is on the list of prohibited locations:
Location given: %s
Prohibited locations: %s
If you believe this is in error, please contact your system administrator
to have the list of prohibited locations changed. Otherwise, please identify
a different location to be used (use -h to see the cmd line option), or
simply let the system pick a default location.
#
[orte:session:dir:nopwname]
Open MPI was unable to obtain the username in order to create a path
for its required temporary directories. This type of error is usually
caused by a transient failure of network-based authentication services
(e.g., LDAP or NIS failure due to network congestion), but can also be
an indication of system misconfiguration.
Please consult your system administrator about these issues and try
again.
#
[orte_nidmap:too_many_nodes]
An error occurred while trying to pack the information about the job. More nodes
have been found than the %d expected. Please check your configuration files such
as the mapping.
#
[orte_init:startup:num_daemons]
Open MPI was unable to determine the number of nodes in your allocation. We
are therefore assuming a very large number to ensure you receive proper error
messages.