![Ralph Castain](/assets/img/avatar_default.png)
* add hnp and orted modules to the errmgr framework. The HNP module contains much of the code that was in the errmgr base since that code could only be executed by the HNP anyway. * update the odls to report process states directly into the active errmgr module, thus removing the need to send messages looped back into the odls cmd processor. Let the active errmgr module decide what to do at various states. * remove the code to track application state progress from the plm_base_launch_support.c code. Update the plm modules to call the errmgr directly when a launch fails. * update the plm_base_receive.c code to call the errmgr with state updates from remote daemons * update the routed modules to reflect that process state is updated in the errmgr * ensure that the orted's open the errmgr and select their appropriate module * add new pretty-print utilities to print process and job state. Move the pretty-print of time info to a globally-accessible place * define a global orte_comm function to send messages from orted's to the HNP so that others can overlay the standard RML methods, if desired. * update the orterun help output to reflect that the "term w/o sync" error message can result from three, not two, scenarios This commit was SVN r23023.
467 строки
16 KiB
Plaintext
467 строки
16 KiB
Plaintext
# -*- text -*-
|
|
#
|
|
# Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana
|
|
# University Research and Technology
|
|
# Corporation. All rights reserved.
|
|
# Copyright (c) 2004-2005 The University of Tennessee and The University
|
|
# of Tennessee Research Foundation. All rights
|
|
# reserved.
|
|
# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
|
# University of Stuttgart. All rights reserved.
|
|
# Copyright (c) 2004-2005 The Regents of the University of California.
|
|
# All rights reserved.
|
|
# Copyright (c) 2007-2010 Cisco Systems, Inc. All rights reserved.
|
|
# $COPYRIGHT$
|
|
#
|
|
# Additional copyrights may follow
|
|
#
|
|
# $HEADER$
|
|
#
|
|
# This is the US/English general help file for Open RTE's orterun.
|
|
#
|
|
[orterun:init-failure]
|
|
Open RTE was unable to initialize properly. The error occured while
|
|
attempting to %s. Returned value %d instead of ORTE_SUCCESS.
|
|
[orterun:usage]
|
|
%s (%s) %s
|
|
|
|
Usage: %s [OPTION]... [PROGRAM]...
|
|
Start the given program using Open RTE
|
|
|
|
%s
|
|
|
|
Report bugs to %s
|
|
[orterun:version]
|
|
%s (%s) %s
|
|
|
|
Report bugs to %s
|
|
[orterun:allocate-resources]
|
|
%s was unable to allocate enough resources to start your application.
|
|
This might be a transient error (too many nodes in the cluster were
|
|
unavailable at the time of the request) or a permenant error (you
|
|
requsted more nodes than exist in your cluster).
|
|
|
|
While probably only useful to Open RTE developers, the error returned
|
|
was %d.
|
|
[orterun:error-spawning]
|
|
%s was unable to start the specified application. An attempt has been
|
|
made to clean up all processes that did start. The error returned was
|
|
%d.
|
|
[orterun:appfile-not-found]
|
|
Unable to open the appfile:
|
|
|
|
%s
|
|
|
|
Double check that this file exists and is readable.
|
|
[orterun:executable-not-specified]
|
|
No executable was specified on the %s command line.
|
|
|
|
Aborting.
|
|
[orterun:multi-apps-and-zero-np]
|
|
%s found multiple applications specified on the command line, with
|
|
at least one that failed to specify the number of processes to execute.
|
|
When specifying multiple applications, you must specify how many processes
|
|
of each to launch via the -np argument.
|
|
[orterun:nothing-to-do]
|
|
%s could not find anything to do.
|
|
|
|
It is possible that you forgot to specify how many processes to run
|
|
via the "-np" argument.
|
|
[orterun:call-failed]
|
|
%s encountered a %s call failure. This should not happen, and
|
|
usually indicates an error within the operating system itself.
|
|
Specifically, the following error occurred:
|
|
|
|
%s
|
|
|
|
The only other available information that may be helpful is the errno
|
|
that was returned: %d.
|
|
[orterun:environ]
|
|
%s was unable to set
|
|
%s = %s
|
|
in the environment. Returned value %d instead of ORTE_SUCCESS.
|
|
[orterun:precondition]
|
|
%s was unable to precondition transports
|
|
Returned value %d instead of ORTE_SUCCESS.
|
|
[orterun:attr-failed]
|
|
%s was unable to define an attribute
|
|
Returned value %d instead of ORTE_SUCCESS.
|
|
#
|
|
[orterun:proc-ordered-abort]
|
|
%s has exited due to process rank %lu with PID %lu on
|
|
node %s calling "abort". This may have caused other processes
|
|
in the application to be terminated by signals sent by %s
|
|
(as reported here).
|
|
#
|
|
[orterun:proc-exit-no-sync]
|
|
%s has exited due to process rank %lu with PID %lu on
|
|
node %s exiting improperly. There are three reasons this could occur:
|
|
|
|
1. this process did not call "init" before exiting, but others in
|
|
the job did. This can cause a job to hang indefinitely while it waits
|
|
for all processes to call "init". By rule, if one process calls "init",
|
|
then ALL processes must call "init" prior to termination.
|
|
|
|
2. this process called "init", but exited without calling "finalize".
|
|
By rule, all processes that call "init" MUST call "finalize" prior to
|
|
exiting or it will be considered an "abnormal termination"
|
|
|
|
3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
|
|
orte_create_session_dirs is set to false. In this case, the run-time cannot
|
|
detect that the abort call was an abnormal termination. Hence, the only
|
|
error message you will receive is this one.
|
|
|
|
This may have caused other processes in the application to be
|
|
terminated by signals sent by %s (as reported here).
|
|
|
|
You can avoid this message by specifying -quiet on the %s command line.
|
|
|
|
#
|
|
[orterun:proc-exit-no-sync-unknown]
|
|
%s has exited due to a process exiting without calling "finalize",
|
|
but has no info as to the process that caused that situation. This
|
|
may have caused other processes in the application to be
|
|
terminated by signals sent by %s (as reported here).
|
|
#
|
|
[orterun:proc-aborted]
|
|
%s noticed that process rank %lu with PID %lu on node %s exited on signal %d.
|
|
#
|
|
[orterun:proc-aborted-unknown]
|
|
%s noticed that the job aborted, but has no info as to the process
|
|
that caused that situation.
|
|
#
|
|
[orterun:proc-aborted-signal-unknown]
|
|
%s noticed that the job aborted by signal, but has no info as
|
|
to the process that caused that situation.
|
|
#
|
|
[orterun:proc-aborted-strsignal]
|
|
%s noticed that process rank %lu with PID %lu on node %s exited on signal %d (%s).
|
|
#
|
|
[orterun:abnormal-exit]
|
|
WARNING: %s has exited before it received notification that all
|
|
started processes had terminated. You should double check and ensure
|
|
that there are no runaway processes still executing.
|
|
#
|
|
[orterun:sigint-while-processing]
|
|
WARNING: %s is in the process of killing a job, but has detected an
|
|
interruption (probably control-C).
|
|
|
|
It is dangerous to interrupt %s while it is killing a job (proper
|
|
termination may not be guaranteed). Hit control-C again within 1
|
|
second if you really want to kill %s immediately.
|
|
#
|
|
[orterun:empty-prefix]
|
|
A prefix was supplied to %s that only contained slashes.
|
|
|
|
This is a fatal error; %s will now abort. No processes were launched.
|
|
#
|
|
[debugger-mca-param-not-found]
|
|
Internal error -- the orte_base_user_debugger MCA parameter was not able to
|
|
be found. Please contact the Open RTE developers; this should not
|
|
happen.
|
|
#
|
|
[debugger-orte_base_user_debugger-empty]
|
|
The MCA parameter "orte_base_user_debugger" was empty, indicating that
|
|
no user-level debuggers have been defined. Please set this MCA
|
|
parameter to a value and try again.
|
|
#
|
|
[debugger-not-found]
|
|
A suitable debugger could not be found in your PATH. Check the values
|
|
specified in the orte_base_user_debugger MCA parameter for the list of
|
|
debuggers that was searched.
|
|
#
|
|
[debugger-exec-failed]
|
|
%s was unable to launch the specified debugger. This is what was
|
|
launched:
|
|
|
|
%s
|
|
|
|
Things to check:
|
|
|
|
- Ensure that the debugger is installed properly
|
|
- Ensure that the "%s" executable is in your path
|
|
- Ensure that any required licenses are available to run the debugger
|
|
#
|
|
[orterun:sys-limit-pipe]
|
|
%s was unable to launch the specified application as it encountered an error:
|
|
|
|
Error: system limit exceeded on number of pipes that can be open
|
|
Node: %s
|
|
|
|
when attempting to start process rank %lu.
|
|
|
|
This can be resolved by setting the mca parameter opal_set_max_sys_limits to 1,
|
|
increasing your limit descriptor setting (using limit or ulimit commands),
|
|
asking the system administrator for that node to increase the system limit, or
|
|
by rearranging your processes to place fewer of them on that node.
|
|
#
|
|
[orterun:sys-limit-sockets]
|
|
Error: system limit exceeded on number of network connections that can be open
|
|
|
|
This can be resolved by setting the mca parameter opal_set_max_sys_limits to 1,
|
|
increasing your limit descriptor setting (using limit or ulimit commands),
|
|
or asking the system administrator to increase the system limit.
|
|
#
|
|
[orterun:pipe-setup-failure]
|
|
%s was unable to launch the specified application as it encountered an error:
|
|
|
|
Error: pipe function call failed when setting up I/O forwarding subsystem
|
|
Node: %s
|
|
|
|
while attempting to start process rank %lu.
|
|
#
|
|
[orterun:sys-limit-children]
|
|
%s was unable to launch the specified application as it encountered an error:
|
|
|
|
Error: system limit exceeded on number of processes that can be started
|
|
Node: %s
|
|
|
|
when attempting to start process rank %lu.
|
|
|
|
This can be resolved by either asking the system administrator for that node to
|
|
increase the system limit, or by rearranging your processes to place fewer of them
|
|
on that node.
|
|
#
|
|
[orterun:failed-term-attrs]
|
|
%s was unable to launch the specified application as it encountered an error:
|
|
|
|
Error: reading tty attributes function call failed while setting up I/O forwarding system
|
|
Node: %s
|
|
|
|
while attempting to start process rank %lu.
|
|
#
|
|
[orterun:wdir-not-found]
|
|
%s was unable to launch the specified application as it could not
|
|
change to the specified working directory:
|
|
|
|
Working directory: %s
|
|
Node: %s
|
|
|
|
while attempting to start process rank %lu.
|
|
#
|
|
[orterun:exe-not-found]
|
|
%s was unable to find the specified executable file, and therefore
|
|
did not launch the job. This error was first reported for process
|
|
rank %lu; it may have occurred for other processes as well.
|
|
|
|
NOTE: A common cause for this error is misspelling a %s command
|
|
line parameter option (remember that %s interprets the first
|
|
unrecognized command line token as the executable).
|
|
|
|
Node: %s
|
|
Executable: %s
|
|
#
|
|
[orterun:exe-not-accessible]
|
|
%s was unable to launch the specified application as it could not access
|
|
or execute an executable:
|
|
|
|
Executable: %s
|
|
Node: %s
|
|
|
|
while attempting to start process rank %lu.
|
|
#
|
|
[orterun:pipe-read-failure]
|
|
%s was unable to launch the specified application as it encountered an error:
|
|
|
|
Error: reading from a pipe function call failed while spawning a local process
|
|
Node: %s
|
|
|
|
while attempting to start process rank %lu.
|
|
#
|
|
[orterun:proc-failed-to-start]
|
|
%s was unable to start the specified application as it encountered an error:
|
|
|
|
Error name: %s
|
|
Node: %s
|
|
|
|
when attempting to start process rank %lu.
|
|
#
|
|
[orterun:proc-socket-not-avail]
|
|
%s was unable to start the specified application as it encountered an error:
|
|
|
|
Error name: %s
|
|
Node: %s
|
|
|
|
when attempting to start process rank %lu.
|
|
#
|
|
[orterun:proc-failed-to-start-no-status]
|
|
%s was unable to start the specified application as it encountered an error
|
|
on node %s. More information may be available above.
|
|
#
|
|
[orterun:proc-failed-to-start-no-status-no-node]
|
|
%s was unable to start the specified application as it encountered an error.
|
|
More information may be available above.
|
|
#
|
|
[debugger requires -np]
|
|
The number of MPI processes to launch was not specified on the command
|
|
line.
|
|
|
|
The %s debugger requires that you specify a number of MPI processes to
|
|
launch on the command line via the "-np" command line parameter. For
|
|
example:
|
|
|
|
%s -np 4 %s
|
|
|
|
Skipping the %s debugger for now.
|
|
#
|
|
[debugger requires executable]
|
|
The %s debugger requires that you specify an executable on the %s
|
|
command line; you cannot specify application context files when
|
|
launching this job in the %s debugger. For example:
|
|
|
|
%s -np 4 my_mpi_executable
|
|
|
|
Skipping the %s debugger for now.
|
|
#
|
|
[debugger only accepts single app]
|
|
The %s debugger only accepts SPMD-style launching; specifying an
|
|
MPMD-style launch (with multiple applications separated via ':') is
|
|
not permitted.
|
|
|
|
Skipping the %s debugger for now.
|
|
#
|
|
[orterun:daemon-died-during-execution]
|
|
%s has detected that a required daemon terminated during execution
|
|
of the application with a non-zero status. This is a fatal error.
|
|
A best-effort attempt has been made to cleanup. However, it is
|
|
-strongly- recommended that you execute the orte-clean utility
|
|
to ensure full cleanup is accomplished.
|
|
#
|
|
[orterun:no-orted-object-exit]
|
|
%s was unable to determine the status of the daemons used to
|
|
launch this application. Additional manual cleanup may be required.
|
|
Please refer to the "orte-clean" tool for assistance.
|
|
#
|
|
[orterun:unclean-exit]
|
|
%s was unable to cleanly terminate the daemons on the nodes shown
|
|
below. Additional manual cleanup may be required - please refer to
|
|
the "orte-clean" tool for assistance.
|
|
#
|
|
[orterun:event-def-failed]
|
|
%s was unable to define an event required for proper operation of
|
|
the system. The reason for this error was:
|
|
|
|
Error: %s
|
|
|
|
Please report this to the Open MPI mailing list users@open-mpi.org.
|
|
#
|
|
[orterun:ompi-server-filename-bad]
|
|
%s was unable to parse the filename where contact info for the
|
|
ompi-server was to be found. The option we were given was:
|
|
|
|
--ompi-server %s
|
|
|
|
This appears to be missing the required ':' following the
|
|
keyword "file". Please remember that the correct format for this
|
|
command line option is:
|
|
|
|
--ompi-server file:path-to-file
|
|
|
|
where path-to-file can be either relative to the cwd or absolute.
|
|
#
|
|
[orterun:ompi-server-filename-missing]
|
|
%s was unable to parse the filename where contact info for the
|
|
ompi-server was to be found. The option we were given was:
|
|
|
|
--ompi-server %s
|
|
|
|
This appears to be missing a filename following the ':'. Please
|
|
remember that the correct format for this command line option is:
|
|
|
|
--ompi-server file:path-to-file
|
|
|
|
where path-to-file can be either relative to the cwd or absolute.
|
|
#
|
|
[orterun:ompi-server-filename-access]
|
|
%s was unable to access the filename where contact info for the
|
|
ompi-server was to be found. The option we were given was:
|
|
|
|
--ompi-server %s
|
|
|
|
Please remember that the correct format for this command line option is:
|
|
|
|
--ompi-server file:path-to-file
|
|
|
|
where path-to-file can be either relative to the cwd or absolute, and that
|
|
you must have read access permissions to that file.
|
|
#
|
|
[orterun:ompi-server-file-bad]
|
|
%s was unable to read the ompi-server's contact info from the
|
|
given filename. The filename we were given was:
|
|
|
|
FILE: %s
|
|
|
|
Please remember that the correct format for this command line option is:
|
|
|
|
--ompi-server file:path-to-file
|
|
|
|
where path-to-file can be either relative to the cwd or absolute, and that
|
|
the file must have a single line in it that contains the Open MPI
|
|
uri for the ompi-server. Note that this is *not* a standard uri, but
|
|
a special format used internally by Open MPI for communications. It can
|
|
best be generated by simply directing the ompi-server to put its
|
|
uri in a file, and then giving %s that filename.
|
|
[orterun:multiple-hostfiles]
|
|
Error: More than one hostfile was passed for a single application context, which
|
|
is not supported at this time.
|
|
#
|
|
[orterun:conflicting-params]
|
|
%s has detected multiple instances of an MCA param being specified on
|
|
the command line, with conflicting values:
|
|
|
|
MCA param: %s
|
|
Value 1: %s
|
|
Value 2: %s
|
|
|
|
This MCA param does not support multiple values, and the system is unable
|
|
to identify which value was intended. If this was done in error, please
|
|
re-issue the command with only one value. You may wish to review the
|
|
output from ompi_info for guidance on accepted values for this param.
|
|
|
|
[orterun:server-not-found]
|
|
%s was instructed to wait for the requested ompi-server, but was unable to
|
|
establish contact with the server during the specified wait time:
|
|
|
|
Server uri: %s
|
|
Timeout time: %ld
|
|
|
|
Error received: %s
|
|
|
|
Please check to ensure that the requested server matches the actual server
|
|
information, and that the server is in operation.
|
|
#
|
|
[orterun:ompi-server-pid-bad]
|
|
%s was unable to parse the PID of the %s to be used as the ompi-server.
|
|
The option we were given was:
|
|
|
|
--ompi-server %s
|
|
|
|
Please remember that the correct format for this command line option is:
|
|
|
|
--ompi-server PID:pid-of-%s
|
|
|
|
where PID can be either "PID" or "pid".
|
|
#
|
|
[orterun:ompi-server-could-not-get-hnp-list]
|
|
%s was unable to search the list of local %s contact files to find the specified pid.
|
|
You might check to see if your local session directory is available and
|
|
that you have read permissions on the top of that directory tree.
|
|
#
|
|
[orterun:ompi-server-pid-not-found]
|
|
%s was unable to find an %s with the specified pid of %d that was to be used as the ompi-server.
|
|
The option we were given was:
|
|
|
|
--ompi-server %s
|
|
|
|
Please remember that the correct format for this command line option is:
|
|
|
|
--ompi-server PID:pid-of-%s
|
|
|
|
where PID can be either "PID" or "pid".
|
|
#
|
|
[orterun:write_file]
|
|
%s was unable to open a file to printout %s as requested. The file
|
|
name given was:
|
|
|
|
File: %s
|