1
1
openmpi/orte/mca/errmgr/hnp/help-orte-errmgr-hnp.txt
Josh Hursey fabd5cc153 Simplification of the ErrMgr framework by removing the 'stack'/composite functionality.
The composite functionality was becoming difficult to maintain, so we removed it for now which simplifies the framework design considerably.

Since the 'crmig' and 'autor' components were -very- similar to the 'hnp' component, this commit also merges them together. By moving the 'crmig' and 'autor' to a separate file under the 'hnp' component we are able to isolate the C/R logic to a large extent, thus being only minimally hooked into the previous 'hnp' component.

So other than some name changes, the functionality is all still in place. I will update the C/R documentation later this morning.

This commit was SVN r23628.
2010-08-19 13:09:20 +00:00

72 строки
2.0 KiB
Plaintext

-*- text -*-
#
# Copyright (c) 2010 Cisco Systems, Inc. All rights reserved.
#
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
# This is the US/English general help file for ORTE Errmgr HNP module.
#
[errmgr-hnp:unknown-job-error]
An error has occurred in an unknown job. This generally should not happen
except due to an internal ORTE error.
Job state: %s
This information should probably be reported to the OMPI developers.
#
[errmgr-hnp:daemon-died]
The system has lost communication with the following daemon:
Daemon: %s
Node: %s
The reason for the lost communication channel is unknown. Possible
reasons include failure of the daemon itself, failure of the
connecting fabric/switch, and loss of the host node. Please
check with your system administrator to try and determine the
source of the problem.
Your job is being terminated as a result.
#
[errmgr-hnp:cannot-relocate]
The system is unable to relocate the specified process:
Process: %s
because the application for that process could not be found. This
appears to be a system error. Please report it to the ORTE
developers.
[autor_recovering_job]
Notice: The processes listed below failed unexpectedly.
Using the last checkpoint to recover the job.
Please standby.
%s
[autor_recovery_complete]
Notice: The job has been successfully recovered from the
last checkpoint.
[autor_failed_to_recover_proc]
Error: The process below has failed. There is no checkpoint available for
this job, so we are terminating the application since automatic
recovery cannot occur.
Internal Name: %s
MCW Rank: %d
[crmig_migrating_job]
Notice: A migration of this job has been requested.
The processes below will be migrated.
Please standby.
%s
[crmig_migrated_job]
Notice: The processes have been successfully migrated to/from the specified
machines.
[crmig_no_migrating_procs]
Warning: Could not find any processes to migrate on the nodes specified.
You provided the following:
Nodes: %s
Procs: %s