1
1

Proper show_help error messages for LAMA.

This commit was SVN r28470.
Этот коммит содержится в:
Jeff Squyres 2013-05-10 15:06:25 +00:00
родитель 353c77e659
Коммит 2ff95a7739
3 изменённых файлов: 290 добавлений и 63 удалений

Просмотреть файл

@ -1,6 +1,7 @@
# -*- text -*-
#
# Copyright (c) 2011 Oak Ridge National Labs. All rights reserved.
# Copyright (c) 2013 Cisco Systems, Inc. All rights reserved.
#
# $COPYRIGHT$
#
@ -15,13 +16,13 @@ RMAPS found multiple applications to be launched, with at least one that failed
to specify the number of processes to execute. When specifying multiple
applications, you must specify how many processes of each to launch via the
-np argument.
#
[orte-rmaps-lama:oversubscribe]
RMaps LAMA detected oversubscription after mapping %d of %d processes.
Since you have asked not to oversubscribe the resources the job will not
be launched. If you would instead like to oversubscribe the resources
try using the --oversubscribe option to mpirun.
#
[orte-rmaps-lama:no-resources-available]
RMaps LAMA detected that there are not enough resources to map the
remainder of the job. Check the command line options, and the number of
@ -33,7 +34,7 @@ nodes allocated to this job.
Binding : %s
MPPR : %s
Ordering : %s
#
[orte-rmaps-lama:merge-conflict-bad-prune-src]
RMaps LAMA detected that it needed to prune a level of the hierarchy that
was necessary for one of the command line parameters. Check your allocation
@ -43,3 +44,130 @@ and the options below to make sure they are correct.
Binding : %s
MPPR : %s
Ordering : %s
#
[invalid mapping option]
The specified mapping option is not supported with the LAMA rmaps
mapper:
Specified mapping option: %s
Reason it is invalid: %s
LAMA supports the following options to the mpirun --map-by option:
node, numa, socket, l1cache, l2cache, l3cache, core, hwthread, slot
Alternatively, LAMA supports specifying a sequence of letters in the
rmaps_lama_map MCA parameter; each letter indicates a "direction" for
mapping. The rmaps_lama_map MCA parameter is richer/more flexible
than the --may-by CLI option. If rmaps_lama_map is specified, the
following letters must be specified:
h: hardware thread
c: processor core
s: processor socket
n: node (server)
The following may also optionally be included in the mapping string:
N: NUMA node
L1: L1 cache
L2: L2 cache
L3: L3 cache
For example, the two commands below are equivalent:
mpirun --mca rmaps lama --mca rmaps_lama_map csNh ...
mpirun --mca rmaps lama --map-by core ...
#
[invalid binding option]
The specified binding option is not supported with the LAMA rmaps
mapper:
Specified binding option: %s
Reason it is invalid: %s
LAMA binding options can be specified via the mpirun --bind-to command
line option or rmaps_lama_bind MCA param:
--bind-to rmaps_lama_binding
Locality option option
---------------- --------- ------------------
Hardware thread hwthread h
Processor core core c
Processor socket socket s
NUMA node numa N
L1 cache l1cache L1
L2 cache l2cache L2
L3 cache l3cache L3
Node (server) node n
The --bind-to option assumes a single locality (e.g., bind each MPI
process to a single core, socket, etc.). The rmaps_lama_bind MCA
param requires an integer specifying how many localities to which to
bind. For example, the following two command lines are equivalent,
and bind each MPI process to a single core:
mpirun --btl rmaps lama --mca rmaps_lama_bind 1c ...
mpirun --btl rmaps lama --bind-to core ...
The rmaps_lama_bind MCA parameter is more flexible than the --bind-to
CLI option, because it allows binding to multiple resources. For
example, specifing an rmaps_lama_bind value of "2c" binds each MPI
process to two cores.
#
[invalid ordering option]
The specified ordering option is not supported.
Specified ordering option: %s
The LAMA ordering can be specified via the rmaps_lama_ordering MCA
parameter.
Two options are supported for ordering ranks in MPI_COMM_WORLD (MCW):
s: Sequential. MCW rank ordering is sequential by hardware thread
across all nodes. E.g., MCW rank 0 is the first process on node
0; MCW rank 1 is the second process on node 0, and so on.
n: Natural. MCW rank ordering follows the "natural" mapping layout.
For example, in a by-socket layout, MCW rank 0 is the first
process on the 1st socket on node 0. MCW rank 1 is then the
first process on the 2nd socket on node 0. And so on.
#
[invalid mppr option]
The specified Max Processes Per Resource (MPPR) value is invalid (in
the rmaps_lama_mppr MCA paramter):
Specified MPPR: %s
Reason is is invalid: %s
The MPPR is a comma-delimited list of specifications indicating how
many processes are allowed on a given type of resource before an MPI
job is considered to have oversubscribed that resource. Each
specification is a token in the format of "NUMBER:RESOURCE". For
example, the default MPPR of "1:c" means that Open MPI will map one
process per processor core before considering cores to be
oversubscribed.
Multiple specifications may be useful; for example "1:c,2:s" maintains
the default one-process-per-core limitation, but places an additional
limitation of only two processes per processor socket (assuming that
there are more than two cores per socket).
The LAMA MPPR specifications are set via the rmaps_lama_mppr MCA
parameter. The following resources can be specified:
Hardware thread h
Processor core c
Processor socket s
NUMA node N
L1 cache L1
L2 cache L2
L3 cache L3
Node (server) n
#
[internal error]
An unexpected internal error occurred in the LAMA mapper; your job
will now fail. Sorry.
File: %s
Message: %s

Просмотреть файл

@ -1,7 +1,7 @@
/*
* Copyright (c) 2011 Oak Ridge National Labs. All rights reserved.
*
* Copyright (c) 2012 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2012-2013 Cisco Systems, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@ -377,10 +377,11 @@ static int orte_rmaps_lama_process_params(orte_job_t *jdata)
char *type_str = NULL;
/*
* Process map/bind/order/mppr aliases
* Process map/bind/order/mppr aliases. It will print its own
* error message if something went wrong.
*/
if( ORTE_SUCCESS != (ret = rmaps_lama_process_alias_params(jdata) ) ) {
opal_output(0, "mca:rmaps:lama: ERROR: Failed while processing aliases");
ORTE_ERROR_LOG(ret);
return ret;
}
@ -395,8 +396,7 @@ static int orte_rmaps_lama_process_params(orte_job_t *jdata)
if( ORTE_SUCCESS != (ret = rmaps_lama_parse_binding(rmaps_lama_cmd_bind,
&lama_binding_level,
&lama_binding_num_levels)) ) {
opal_output(0, "mca:rmaps:lama: ERROR: Invalid Binding String: %s",
rmaps_lama_cmd_bind);
ORTE_ERROR_LOG(ret);
return ret;
}
@ -423,6 +423,7 @@ static int orte_rmaps_lama_process_params(orte_job_t *jdata)
&lama_mapping_layout,
&lama_mapping_layout_sort,
&lama_mapping_num_layouts)) ) {
/* JMS Check -- I think ^^ will show_help, so this should be redundant */
opal_output(0, "mca:rmaps:lama: ERROR: Invalid Mapping Process Layout: %s",
rmaps_lama_cmd_map);
return ret;

Просмотреть файл

@ -1,6 +1,6 @@
/*
* Copyright (c) 2011 Oak Ridge National Labs. All rights reserved.
*
* Copyright (c) 2013 Cisco Systems, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@ -17,6 +17,7 @@
#include "orte/mca/rmaps/base/rmaps_private.h"
#include "orte/mca/rmaps/base/base.h"
#include "orte/util/show_help.h"
#include <ctype.h>
@ -54,7 +55,10 @@ int rmaps_lama_process_alias_params(orte_job_t *jdata)
break;
case ORTE_MAPPING_BYBOARD:
/* rmaps_lama_cmd_map = strdup("bnNsL3L2L1ch"); */
opal_output(0, "mca:rmaps:lama: ERROR: Unsupported Mapping Option!");
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mapping option",
true,
"by board", "mapping by board not supported by LAMA");
exit_status = ORTE_ERR_NOT_SUPPORTED;
goto cleanup;
break;
@ -85,9 +89,24 @@ int rmaps_lama_process_alias_params(orte_job_t *jdata)
rmaps_lama_cmd_map = strdup("hcsbn");
break;
case ORTE_MAPPING_RR:
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mapping option",
true,
"round robin", "mapping by round robin not supported by LAMA");
exit_status = ORTE_ERR_NOT_SUPPORTED;
goto cleanup;
case ORTE_MAPPING_SEQ:
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mapping option",
true,
"sequential", "mapping by sequential not supported by LAMA");
exit_status = ORTE_ERR_NOT_SUPPORTED;
goto cleanup;
case ORTE_MAPPING_BYUSER:
opal_output(0, "mca:rmaps:lama: ERROR: Unsupported Mapping Option!");
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mapping option",
true,
"by user", "mapping by user not supported by LAMA");
exit_status = ORTE_ERR_NOT_SUPPORTED;
goto cleanup;
default:
@ -115,7 +134,10 @@ int rmaps_lama_process_alias_params(orte_job_t *jdata)
switch( OPAL_GET_BINDING_POLICY(jdata->map->binding) ) {
case OPAL_BIND_TO_BOARD:
/* rmaps_lama_cmd_bind = strdup("1b"); */
opal_output(0, "mca:rmaps:lama: ERROR: Unsupported Binding Option!");
orte_show_help("help-orte-rmaps-lama.txt",
"invalid binding option",
true,
"by board", "binding to board not supported by LAMA");
exit_status = ORTE_ERR_NOT_SUPPORTED;
goto cleanup;
break;
@ -141,7 +163,10 @@ int rmaps_lama_process_alias_params(orte_job_t *jdata)
rmaps_lama_cmd_bind = strdup("1h");
break;
case OPAL_BIND_TO_CPUSET:
opal_output(0, "mca:rmaps:lama: ERROR: Unsupported Binding Option!");
orte_show_help("help-orte-rmaps-lama.txt",
"invalid binding option",
true,
"by CPU set", "binding to CPU set not supported by LAMA");
exit_status = ORTE_ERR_NOT_SUPPORTED;
goto cleanup;
break;
@ -172,7 +197,10 @@ int rmaps_lama_process_alias_params(orte_job_t *jdata)
break;
case ORTE_RANK_BY_BOARD:
/* rmaps_lama_cmd_ordering = strdup("n"); */
opal_output(0, "mca:rmaps:lama: ERROR: Unsupported Ordering/Ranking Option!");
orte_show_help("help-orte-rmaps-lama.txt",
"invalid ordering option",
true,
"by board", "ordering by board not supported by LAMA");
exit_status = ORTE_ERR_NOT_SUPPORTED;
goto cleanup;
break;
@ -255,7 +283,6 @@ int rmaps_lama_process_alias_params(orte_job_t *jdata)
"mca:rmaps:lama: Order : %s",
rmaps_lama_cmd_ordering);
cleanup:
return exit_status;
}
@ -283,6 +310,11 @@ int rmaps_lama_parse_mapping(char *layout,
* then this is an error.
*/
if( NULL == layout ) {
orte_show_help("help-orte-rmaps-lama.txt",
"internal error",
true,
"rmaps_lama_parse_mapping",
"internal error 1");
return ORTE_ERROR;
}
@ -305,8 +337,10 @@ int rmaps_lama_parse_mapping(char *layout,
* Check for 2 characters
*/
if( i >= len ) {
opal_output(0, "mca:rmaps:lama: Error: Cache Level must be followed by a number [%s]!",
layout);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mapping option",
true,
layout, "cache level missing number");
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -349,8 +383,13 @@ int rmaps_lama_parse_mapping(char *layout,
* Look for unknown and unsupported options
*/
if( LAMA_LEVEL_UNKNOWN <= (*layout_types)[i] ) {
opal_output(0, "mca:rmaps:lama: Error: Unknown or Unsupported option in layout [%s] position %d!",
layout, i+1);
char *msg;
asprintf(&msg, "unknown mapping level at position %d", i + 1);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mapping option",
true,
layout, msg);
free(msg);
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -372,8 +411,14 @@ int rmaps_lama_parse_mapping(char *layout,
*/
for( j = i+1; j < *num_types; ++j ) {
if( (*layout_types)[i] == (*layout_types)[j] ) {
opal_output(0, "mca:rmaps:lama: Error: Duplicate key detected in layout [%s] position %d and %d!",
layout, i+1, j+1);
char *msg;
asprintf(&msg, "duplicate mapping levels at position %d and %d",
i + 1, j + 1);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mapping option",
true,
layout, msg);
free(msg);
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -385,22 +430,37 @@ int rmaps_lama_parse_mapping(char *layout,
* - machine
* - hardware thread (needed for lower bound binding) JJH: We should be able to lift this...
* - binding layer (need it to stride the mapping)
* Only print the error message once, for brevity.
*/
if( !found_req_param_n ) {
opal_output(0, "mca:rmaps:lama: Error: Required level not specified 'n' in layout [%s]!",
layout);
char *msg;
asprintf(&msg, "missing required 'n' mapping token");
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mapping option",
true,
layout, msg);
free(msg);
exit_status = ORTE_ERROR;
goto cleanup;
}
if( !found_req_param_h ) {
opal_output(0, "mca:rmaps:lama: Error: Required level not specified 'h' in layout [%s]!",
layout);
else if(!found_req_param_h) {
char *msg;
asprintf(&msg, "missing required 'h' mapping token");
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mapping option",
true,
layout, msg);
free(msg);
exit_status = ORTE_ERROR;
goto cleanup;
}
if( !found_req_param_bind ) {
opal_output(0, "mca:rmaps:lama: Error: Required binding level [%s] not specified in mapping layout [%s]!",
rmaps_lama_cmd_bind, layout);
} else if (!found_req_param_bind) {
char *msg;
asprintf(&msg, "missing required mapping token for the current binding level");
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mapping option",
true,
layout, msg);
free(msg);
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -410,7 +470,6 @@ int rmaps_lama_parse_mapping(char *layout,
*/
qsort((*layout_types_sorted ), (*num_types), sizeof(int), lama_parse_int_sort);
cleanup:
return exit_status;
}
@ -449,8 +508,10 @@ int rmaps_lama_parse_binding(char *layout, rmaps_lama_level_type_t *binding_leve
* Check: Digits must come first
*/
if( p != 0 ) {
opal_output(0, "mca:rmaps:lama: Error: Binding: Digits must only come before Level string [%s]!",
layout);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid binding option",
true,
layout, "missing digit(s) before binding level token");
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -461,8 +522,10 @@ int rmaps_lama_parse_binding(char *layout, rmaps_lama_level_type_t *binding_leve
* Check: Exceed bound of number of digits
*/
if( n >= MAX_BIND_DIGIT_LEN ) {
opal_output(0, "mca:rmaps:lama: Error: Binding: Too many digits in [%s]! Limit %d",
layout, MAX_BIND_DIGIT_LEN-1);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid binding option",
true,
layout, "too many digits");
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -475,8 +538,10 @@ int rmaps_lama_parse_binding(char *layout, rmaps_lama_level_type_t *binding_leve
* Check: Digits must come first
*/
if( n == 0 ) {
opal_output(0, "mca:rmaps:lama: Error: Binding: Digits must come before Level string [%s] [%c]!",
layout, layout[i]);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid binding option",
true,
layout, "missing digit(s) before binding level token");
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -484,8 +549,10 @@ int rmaps_lama_parse_binding(char *layout, rmaps_lama_level_type_t *binding_leve
* Check: Only one level allowed
*/
if( p != 0 ) {
opal_output(0, "mca:rmaps:lama: Error: Binding: Only one level may be specified [%s]!",
layout);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid binding option",
true,
layout, "only one binding level may be specified");
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -502,8 +569,10 @@ int rmaps_lama_parse_binding(char *layout, rmaps_lama_level_type_t *binding_leve
* Check for 2 characters
*/
if( i >= len ) {
opal_output(0, "mca:rmaps:lama: Error: Cache Level must be followed by a number [%s]!",
layout);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid binding option",
true,
layout, "only one binding level may be specified");
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -529,8 +598,10 @@ int rmaps_lama_parse_binding(char *layout, rmaps_lama_level_type_t *binding_leve
* Check that the level was specified
*/
if( p == 0 ) {
opal_output(0, "mca:rmaps:lama: Error: Binding: Level not specified [%s]!",
layout);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid binding option",
true,
layout, "binding specification is empty");
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -543,8 +614,10 @@ int rmaps_lama_parse_binding(char *layout, rmaps_lama_level_type_t *binding_leve
* Check for unknown level
*/
if( LAMA_LEVEL_UNKNOWN <= *binding_level ) {
opal_output(0, "mca:rmaps:lama: Error: Unknown or Unsupported option in layout [%s]!",
layout);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid binding option",
true,
layout, "unknown binding level");
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -603,8 +676,10 @@ int rmaps_lama_parse_mppr(char *layout, rmaps_lama_level_info_t **mppr_levels, i
* Check: Digits must come first
*/
if( p != 0 ) {
opal_output(0, "mca:rmaps:lama: Error: MPPR: Digits must only come before Level string [%s] at [%s]!",
layout, argv[j]);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mppr option",
true,
layout, "missing digit(s) before resource specification");
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -615,8 +690,10 @@ int rmaps_lama_parse_mppr(char *layout, rmaps_lama_level_info_t **mppr_levels, i
* Check: Exceed bound of number of digits
*/
if( n >= MAX_BIND_DIGIT_LEN ) {
opal_output(0, "mca:rmaps:lama: Error: MPPR: Too many digits in [%s]! Limit %d",
argv[j], MAX_BIND_DIGIT_LEN-1);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mppr option",
true,
layout, "too many digits");
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -629,8 +706,10 @@ int rmaps_lama_parse_mppr(char *layout, rmaps_lama_level_info_t **mppr_levels, i
* Check: Digits must come first
*/
if( n == 0 ) {
opal_output(0, "mca:rmaps:lama: Error: MPPR: Digits must come before Level string [%s]!",
argv[j]);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mppr option",
true,
layout, "missing digit(s) before resource specification");
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -638,8 +717,10 @@ int rmaps_lama_parse_mppr(char *layout, rmaps_lama_level_info_t **mppr_levels, i
* Check: Only one level allowed
*/
if( p != 0 ) {
opal_output(0, "mca:rmaps:lama: Error: MPPR: Only one level may be specified [%s]!",
argv[j]);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mppr option",
true,
layout, "only one resource type may be listed per specification");
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -656,8 +737,10 @@ int rmaps_lama_parse_mppr(char *layout, rmaps_lama_level_info_t **mppr_levels, i
* Check for 2 characters
*/
if( i >= len ) {
opal_output(0, "mca:rmaps:lama: Error: MPPR: Cache Level must be followed by a number [%s]!",
argv[j]);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mppr option",
true,
layout, "cache level missing number");
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -691,8 +774,10 @@ int rmaps_lama_parse_mppr(char *layout, rmaps_lama_level_info_t **mppr_levels, i
* Check that the level was specified
*/
if( p == 0 ) {
opal_output(0, "mca:rmaps:lama: Error: MPPR: Level not specified [%s]!",
layout);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mppr option",
true,
layout, "resource type not specified");
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -716,8 +801,13 @@ int rmaps_lama_parse_mppr(char *layout, rmaps_lama_level_info_t **mppr_levels, i
* Look for unknown and unsupported options
*/
if( LAMA_LEVEL_UNKNOWN <= (*mppr_levels)[i].type ) {
opal_output(0, "mca:rmaps:lama: Error: Unknown or Unsupported option in layout [%s] position %d!",
layout, i+1);
char *msg;
asprintf(&msg, "unknown resource type at position %d", i + 1);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mppr option",
true,
layout, msg);
free(msg);
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -727,8 +817,14 @@ int rmaps_lama_parse_mppr(char *layout, rmaps_lama_level_info_t **mppr_levels, i
*/
for( j = i+1; j < *num_types; ++j ) {
if( (*mppr_levels)[i].type == (*mppr_levels)[j].type ) {
opal_output(0, "mca:rmaps:lama: Error: Duplicate key detected in layout [%s] position %d and %d!",
layout, i+1, j+1);
char *msg;
asprintf(&msg, "duplicate resource tpyes at position %d and %d",
i + 1, j + 1);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid mppr option",
true,
layout, msg);
free(msg);
exit_status = ORTE_ERROR;
goto cleanup;
}
@ -773,8 +869,10 @@ int rmaps_lama_parse_ordering(char *layout,
* Check for unknown options
*/
else {
opal_output(0, "mca:rmaps:lama: Error: Unknown or Unsupported option in ordering [%s]!",
layout);
orte_show_help("help-orte-rmaps-lama.txt",
"invalid ordering option",
true,
"unsupported ordering option", layout);
return ORTE_ERROR;
}