1
1
openmpi/orte/mca/rmaps/lama/rmaps_lama_params.c
Jeff Squyres fcc1c7e33c = Overview =
First revision of the Locatation Aware Mapping Algorithm (LAMA) RMAPS
component.  This component is used to effect many different types of
regular of process/processor affinity patterns.  Although quite
flexible in the patterns that it provides, it is ''not'' a
fully-arbitrary, rankfile-like solution for process/processor
affinity.  

Inspiried by !BlueGene-like network specifications, LAMA has a core
algorithm that is quite good at specifying regular patterns in
multiple "dimensions" (where "dimensions" are expressed in terms of
different hardware elements: processor hardware threads, cores,
sockets, ...etc.).  The LAMA core algorithm is described here:

  http://www.open-mpi.org/papers/cluster-2011-lama/

= LAMA Usage Levels =

LAMA allows specifying affinity multiple different ways:

 1. None: Speciying no affinity options to mpirun results in exactly
    the same behavior as today: no affinity is used.
 1. Simple: Using the mpirun options "--bind-to <WIDTH>" and "--map-to
    <LEVEL>" to indicate how "wide" each process should be bound
    (i.e., bind to a processor core, or to a processor socket, etc.)
    and how to lay out the processes (i.e., round robin by cores,
    sockets, etc.).  
 1. Expert: Using four new MCA parameters to effect process mapping
    and binding to processors.  These options are a bit complex, and
    are not for the faint at heart, but offer a high degree of
    (regular pattern) flexibility (each of these are described more
    fully below): 
    * rmaps_lama_map: a sequence of characters describing how to lay
      out processes
    * rmaps_lama_bind: a sequence of characters describing the
      resources to bind to each process
    * rmaps_lama_mppr: a sequence of characters describing the maximum
      number of processes to allow per resource (i.e., a specific
      definition of "oversubscription")
    * rmaps_lama_ordering: once all processes are in place, how to
      order the ranks in MPI_COMM_WORLD

We anticipate that most users will utilize the "None" and "Simple"
levels of affinity, and they continue to work just as they do with the
v1.6 series and SVN trunk.  

The Expert level was designed for two purposes:

 1. To provide a precise definition for the "Simple" level (i.e.,
 every
    --bind-to/--map-by option in the "Simple" level has a
 corresponding
    precise specification in the "Expert" level)
 1. As modern computing platforms become more complex, we simply
    cannot predict what application developers will need in terms of
    processor affinity.  LAMA is an attempt to provide a highly
    flexible mechanism that allows applications to utilize a variety
    of complex, unique affinity patterns beyond the common "bind to
    core" and "bind to socket" patterns.

= LAMA Simple Level =

The "Simple" level is pretty much the same as what Open MPI has
offered for years.  It supports the same --bind-to and --map-by
options that Open MPI has supported for a while, but expands their
scope a bit.

Specifically, the following options are available for both --bind-to
and --map-by:

 * slot
 * hwthread
 * core
 * l1cache
 * l2cache
 * l3cache
 * socket
 * numa
 * board
 * node

= LAMA Expert Level =

The "Expert" level requires some explanation.  I'll repeat my
disclaimer here: the LAMA Expert level is not for the meek.  It is
flexible, but complex.  '''Most users won't need the Expert level.'''

LAMA works in three phases: mapping, binding, and ordering.  Each is
described below.

== Expert: Mapping ==

Processes are paired with sets of resources.  For example, each
process may be paired with a single processor core.  Or each process
may be paired with an entire processor socket.  LAMA performs this
mapping, obeying the Max Processes Per Resource ("MPPR", pronounced
"mipper") limits.  More on MPPR, below.

Mapping can be performed across multiple hardware levels:

 * h: Hardware thread
 * c: Processor core
 * s: Processor socket
 * L1: L1 cache
 * L2: L2 cache
 * L3: L3 cache
 * N: NUMA node
 * b: Processor board
 * n: Server node

If the act of mapping is that of pairing MPI processes to the
resources that have been allocated to a job, one can easily imagine
looping through all the resources and assigning processes to them.

But to effect different process process layout patterns across those
resources, one may want to loop over those resources ''in a different
order.''  That is, if the above-mentioned nine hardware resources
(hardware thread, processor core, etc.) can be thought of as an
nine-dimensional space, you can imagine nine nested loops to traverse
all of them.  And you can imagine that changing the order of nesting
would change the traversal pattern.

LAMA accepts a sequence of tokens representing the above-mentioned
nine hardware resources to specify the order of looping when mapping
resources to processes.

For example, consider a "simple" traversal: csL1L2L3Nbnh.  Reading
that sequence of letters from left-to-right, it specifies mapping by
processor core, processor socket, L1 cache, L2 cache, L3 cache, NUMA
node, processor board, server node, and finally hardware thread.

Wait... what?  That string specifies resources from "smallest" to
"largest" -- with the exception of hardware threads.  Why are they
tacked on to the end?

In short, this string of letters means "map by round robin by core" --
(indeed, it exactly corresponds to the Simple level "--map-by core").
Specifically, LAMA traverses the string from left-to-right and maps
processes to all the resources indicated by that token (e.g., "c" for
processor core).  When there are no more resources indicated by that
token, it goes on to the next token.

Hence, in this case, LAMA will map the first process to the first
core, then it will map the second process to the second core, and so
on.

Once all the cores are exhausted, LAMA effectively ignores all the
other letters until "h" (because all the other resources are made up
of cores; when cores are exhausted, those resources are exhausted,
too).  

If there are still more processes to be mapped, LAMA will then
traverse all the hyperthreads -- meaning that the next process will be
mapped to the second hyperthread on the first core.  And the next
process will be mapped to the second hyperthread on the second core.
And so on.

Keep in mind that the cores involved may span many server nodes; we're
not just talking about the cores (etc.) in a single machine.

As another example, the sequence "sL1L2L3Nbnch" is exactly equivalent
to "--map-by socket" (i.e., LAMA maps the first process to the first
socket, the second process to the second socket, and so on).

The sequence of letter can be combined in many, many different ways to
produce many different regular mapping patterns.

=== Max Processes Per Resource (MPPR) ===

The MPPR is an expression that precisely defines the maximum number of
processes that can be mapped to any single resource.  In effect, it
defines the concept of "oversubscription."  Specifically, traditional
HPC wisdom is that "oversubscription" is when there is more than one
MPI process per processor core.

This conventional defintion is expressed in a MPPR string of "1:c"
(one process per core).  

But what if your MPI processes are multi-threaded, and they need
multiple processes per core?  You'd need a different description of
"oversubscription" in this case.  Perhaps you want to have one MPI
process per socket.  This would be expressed in a MPPR string of
"1:s".

The general form of an individual MPPR specification is an integer
follow by a colon, followed by any of the tokens from mapping can be
used in the MPPR specification.  For example "1:c" is pronounced "one
process per core."

Multiple MPPR specifications can be strung together into a
comma-delimited list, too.  All of these MPPR values and then taken
into account when mapping.  Here's some examples:

 * 1:c -- allow, at most, one process per processor core (i.e., don't
   schedule by hyperthread)
 * 1:s -- allow, at most, one process per processor socket (e.g., 
   that process may be multithreaded, or wants exclusive use of the
   socket's caches)
 * 1:s,2:n -- only allow one process per processor socket, but, at
   most, two processes per server node (e.g., if the two MPI processes
   will consume all the RAM on the server node, even if there are more
   processor cores available)

If mapping all processes to resources would exceed a MPPR limit, this
job is ruled to be oversubscribed.  If --oversubscribe was specified
on the mpirun command line, the job continues.  Otherwise, LAMA will
abort the job.

Additionally, if --oversubscribe is specified, LAMA will endlessly
cycle through the mapping token string untill all processes have been
mapped.

== Expert: Binding == 

Once processes have been paired with resources during the Mapping
stage, they are optionally bound to a (potentially different) set of
resources.  For example, processes may be mapped round robin by
processor socket, but bound to an individual processor core.

To be clear: if binding is not used, then mapping is effectively
reduced to "counting how many processes end up on each server node."
Without binding, there's no enforcement that a process will stay where
LAMA thinks it was placed.  

With binding, however, processes are bound to a set of hardware
threads.  The number of threads to which the process is bound is
sometimes referred to as the "binding width".  For example, if a
process is bound to all the hardware threads in a processor socket,
its "width" is the processor socket.

(note that we specifically do not say that the hardware threads are
sequential, even if they are all within a single resource such as a
processor core or socket.  BIOS ordering of hardware threads can be
wonky; so we only refer to "sets of hardware threads")

Bindings are expressed as an integer and a token from the mapping
string.  For example "1s" means "bind each process to one processor
socket" (there is no ":" in the binding string because the ":" is
pronounced as "per" when reading the MPPR string).

Note that it only makes sense to bind processes to a single resource
specification (unlike the MPPR specification, where multiple limits
can be specified).

== Expert: Ordering ==

Finally, processes are assigned a rank in MPI_COMM_WORLD.  LAMA
currently offers two ordering modes: sequential or natural:

 * Sequential: if you laid out all the hardware resources in a single
   line, and then overlaid all the MPI processes on top of them, they
   are ordered from 0 to (N-1) from left-to-right.
 * Natural: the ordering of ranks follows the mapping ordering.  For
   example, consider a server node with two processor sockets, each
   containing four cores.  The command line "mpirun -np 8 --bind-to
   core --map-by socket --order n a.out" would result in MCW ranks
   that look like this: [0 2 4 6] [1 3 5 7].

= Execution =

At this point, the job is fully mapped, optionally bound, and its
ranks in MPI_COMM_WORLD are ordered.  It now starts its execution.

= Final Notes =

Note that at this point, lama is not the default mapper.  It must be
activiated with "--mca rmaps lama".  We'll continue to do further
testing and comparitive analysis with the current set of ORTE mappers.

Also, note that the LAMA algorithm can handle heterogeneity between
hardware resources (e.g., an MPI job spanning server nodes with
differing numbers of processor sockets).  For lack of a longer
explanation (this commit message already long enough!), LAMA considers
each server node individually during mapping and binding.

See the LAMA paper for more details:
http://www.open-mpi.org/papers/cluster-2011-lama/

This commit was SVN r27206.
2012-08-31 19:57:53 +00:00

839 строки
25 KiB
C

/*
* Copyright (c) 2011 Oak Ridge National Labs. All rights reserved.
*
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
/**
* Processing for command line interface options
*
*/
#include "rmaps_lama.h"
#include "orte/mca/rmaps/base/rmaps_private.h"
#include "orte/mca/rmaps/base/base.h"
#include <ctype.h>
/*********************************
* Local Functions
*********************************/
/*
* QSort: Integer comparison
*/
static int lama_parse_int_sort(const void *a, const void *b);
/*
* Convert the '-ppr' syntax from the 'ppr' component to the 'lama' '-mppr' syntax.
*/
static char * rmaps_lama_covert_ppr(char * given_ppr);
/*********************************
* Parsing Functions
*********************************/
int rmaps_lama_process_alias_params(orte_job_t *jdata)
{
int exit_status = ORTE_SUCCESS;
int param_tmp, param_value;
/*
* Mapping options
* Note: L1, L2, L3 are not exposed in orterun to the user, so
* there is no need to specify them here.
*/
if( NULL == rmaps_lama_cmd_map ) {
/* orte_rmaps_base.mapping */
switch( ORTE_GET_MAPPING_POLICY(jdata->map->mapping) ) {
case ORTE_MAPPING_BYNODE:
/* rmaps_lama_cmd_map = strdup("nbNsL3L2L1ch"); */
rmaps_lama_cmd_map = strdup("nbsch");
break;
case ORTE_MAPPING_BYBOARD:
/* rmaps_lama_cmd_map = strdup("bnNsL3L2L1ch"); */
opal_output(0, "mca:rmaps:lama: ERROR: Unsupported Mapping Option!");
exit_status = ORTE_ERR_NOT_SUPPORTED;
goto cleanup;
break;
case ORTE_MAPPING_BYNUMA:
/* rmaps_lama_cmd_map = strdup("NbnsL3L2L1ch"); */
rmaps_lama_cmd_map = strdup("Nbnsch");
break;
case ORTE_MAPPING_BYSOCKET:
/* rmaps_lama_cmd_map = strdup("sNbnL3L2L1ch"); */
rmaps_lama_cmd_map = strdup("sbnch");
break;
case ORTE_MAPPING_BYL3CACHE:
rmaps_lama_cmd_map = strdup("L3sNbnL2L1ch");
break;
case ORTE_MAPPING_BYL2CACHE:
rmaps_lama_cmd_map = strdup("L2sNbnL1ch");
break;
case ORTE_MAPPING_BYL1CACHE:
rmaps_lama_cmd_map = strdup("L1sNbnch");
break;
case ORTE_MAPPING_BYCORE:
case ORTE_MAPPING_BYSLOT:
/* rmaps_lama_cmd_map = strdup("cL1L2L3sNbnh"); */
rmaps_lama_cmd_map = strdup("csbnh");
break;
case ORTE_MAPPING_BYHWTHREAD:
/* rmaps_lama_cmd_map = strdup("hcL1L2L3sNbn"); */
rmaps_lama_cmd_map = strdup("hcsbn");
break;
case ORTE_MAPPING_RR:
case ORTE_MAPPING_SEQ:
case ORTE_MAPPING_BYUSER:
opal_output(0, "mca:rmaps:lama: ERROR: Unsupported Mapping Option!");
exit_status = ORTE_ERR_NOT_SUPPORTED;
goto cleanup;
default:
/*
* Default is map-by core
*/
rmaps_lama_cmd_map = strdup("cL1L2L3sNbnh");
break;
}
}
/*
* Binding Options
*/
if( NULL == rmaps_lama_cmd_bind ) {
/*
* No binding specified, use default
*/
if( !OPAL_BINDING_POLICY_IS_SET(jdata->map->binding) ||
!OPAL_BINDING_REQUIRED(opal_hwloc_binding_policy) ||
OPAL_BIND_TO_NONE == OPAL_GET_BINDING_POLICY(jdata->map->binding) ) {
rmaps_lama_cmd_bind = NULL;
}
switch( OPAL_GET_BINDING_POLICY(jdata->map->binding) ) {
case OPAL_BIND_TO_BOARD:
/* rmaps_lama_cmd_bind = strdup("1b"); */
opal_output(0, "mca:rmaps:lama: ERROR: Unsupported Binding Option!");
exit_status = ORTE_ERR_NOT_SUPPORTED;
goto cleanup;
break;
case OPAL_BIND_TO_NUMA:
rmaps_lama_cmd_bind = strdup("1N");
break;
case OPAL_BIND_TO_SOCKET:
rmaps_lama_cmd_bind = strdup("1s");
break;
case OPAL_BIND_TO_L3CACHE:
rmaps_lama_cmd_bind = strdup("1L3");
break;
case OPAL_BIND_TO_L2CACHE:
rmaps_lama_cmd_bind = strdup("1L2");
break;
case OPAL_BIND_TO_L1CACHE:
rmaps_lama_cmd_bind = strdup("1L1");
break;
case OPAL_BIND_TO_CORE:
rmaps_lama_cmd_bind = strdup("1c");
break;
case OPAL_BIND_TO_HWTHREAD:
rmaps_lama_cmd_bind = strdup("1h");
break;
case OPAL_BIND_TO_CPUSET:
opal_output(0, "mca:rmaps:lama: ERROR: Unsupported Binding Option!");
exit_status = ORTE_ERR_NOT_SUPPORTED;
goto cleanup;
break;
default:
rmaps_lama_cmd_bind = NULL;
break;
}
}
/*
* Ordering (a.k.a. Ranking) Options
*/
if( NULL == rmaps_lama_cmd_ordering ) {
/* orte_rmaps_base.ranking */
switch( ORTE_GET_RANKING_POLICY(jdata->map->ranking) ) {
case ORTE_RANK_BY_SLOT:
rmaps_lama_cmd_ordering = strdup("s");
break;
case ORTE_RANK_BY_NODE:
case ORTE_RANK_BY_NUMA:
case ORTE_RANK_BY_SOCKET:
case ORTE_RANK_BY_L3CACHE:
case ORTE_RANK_BY_L2CACHE:
case ORTE_RANK_BY_L1CACHE:
case ORTE_RANK_BY_CORE:
case ORTE_RANK_BY_HWTHREAD:
rmaps_lama_cmd_ordering = strdup("n");
break;
case ORTE_RANK_BY_BOARD:
/* rmaps_lama_cmd_ordering = strdup("n"); */
opal_output(0, "mca:rmaps:lama: ERROR: Unsupported Ordering/Ranking Option!");
exit_status = ORTE_ERR_NOT_SUPPORTED;
goto cleanup;
break;
default:
rmaps_lama_cmd_ordering = strdup("n");
break;
}
}
/*
* MPPR
*/
if( NULL == rmaps_lama_cmd_mppr ) {
/*
* Take what the user specified as the -ppr
*/
if( NULL != jdata->map->ppr) {
rmaps_lama_cmd_mppr = rmaps_lama_covert_ppr(jdata->map->ppr);
}
/*
* Otherwise look at the parameters registered for the ppn component
*/
else {
/*
* -pernode => -mppr 1:n
*/
if( NULL == rmaps_lama_cmd_mppr ) {
param_tmp = mca_base_param_reg_int_name("rmaps", "ppr_pernode",
"Launch one ppn as directed",
false, false, (int)false, NULL);
mca_base_param_reg_syn_name(param_tmp, "rmaps", "base_pernode", false);
mca_base_param_lookup_int(param_tmp, &param_value);
if( param_value ) {
rmaps_lama_cmd_mppr = strdup("1:n");
}
}
/*
* -npernode X => -mppr X:n
*/
if( NULL == rmaps_lama_cmd_mppr ) {
param_tmp = mca_base_param_reg_int_name("rmaps", "ppr_n_pernode",
"Launch n procs/node",
false, false, (int)false, NULL);
mca_base_param_reg_syn_name(param_tmp, "rmaps", "base_n_pernode", false);
mca_base_param_lookup_int(param_tmp, &param_value);
if( param_value ) {
asprintf(&rmaps_lama_cmd_mppr, "%d:n", param_value);
}
}
/*
* -npersocket X => -mppr X:s
*/
if( NULL == rmaps_lama_cmd_mppr ) {
param_tmp = mca_base_param_reg_int_name("rmaps", "ppr_n_persocket",
"Launch n procs/socket",
false, false, (int)false, NULL);
mca_base_param_reg_syn_name(param_tmp, "rmaps", "base_n_persocket", false);
mca_base_param_lookup_int(param_tmp, &param_value);
if( param_value ) {
asprintf(&rmaps_lama_cmd_mppr, "%d:s", param_value);
}
}
/*
* -ppr => ~ -mppr
*/
if( NULL == rmaps_lama_cmd_mppr ) {
mca_base_param_reg_string_name("rmaps", "ppr_pattern",
"Comma-separated list of number of processes on a given resource type [default: none]",
false, false, NULL, &(jdata->map->ppr));
if( NULL != jdata->map->ppr ) {
rmaps_lama_cmd_mppr = rmaps_lama_covert_ppr(jdata->map->ppr);
}
}
}
}
/*
* Oversubscription
*/
if( ORTE_MAPPING_NO_OVERSUBSCRIBE & ORTE_GET_MAPPING_DIRECTIVE(jdata->map->mapping) ) {
rmaps_lama_can_oversubscribe = false;
}
else {
rmaps_lama_can_oversubscribe = true;
}
/*
* Display revised values
*/
opal_output_verbose(5, orte_rmaps_base.rmaps_output,
"mca:rmaps:lama: Revised Parameters -----");
opal_output_verbose(5, orte_rmaps_base.rmaps_output,
"mca:rmaps:lama: Map : %s",
rmaps_lama_cmd_map);
opal_output_verbose(5, orte_rmaps_base.rmaps_output,
"mca:rmaps:lama: Bind : %s",
rmaps_lama_cmd_bind);
opal_output_verbose(5, orte_rmaps_base.rmaps_output,
"mca:rmaps:lama: MPPR : %s",
rmaps_lama_cmd_mppr);
opal_output_verbose(5, orte_rmaps_base.rmaps_output,
"mca:rmaps:lama: Order : %s",
rmaps_lama_cmd_ordering);
cleanup:
return exit_status;
}
static char * rmaps_lama_covert_ppr(char * given_ppr)
{
return strdup(given_ppr);
}
int rmaps_lama_parse_mapping(char *layout,
rmaps_lama_level_type_t **layout_types,
rmaps_lama_level_type_t **layout_types_sorted,
int *num_types)
{
int exit_status = ORTE_SUCCESS;
char param[3];
int i, j, len;
bool found_req_param_n = false;
bool found_req_param_h = false;
bool found_req_param_bind = false;
/*
* Sanity Check:
* There is no default layout, so if we get here and nothing is specified
* then this is an error.
*/
if( NULL == layout ) {
return ORTE_ERROR;
}
*num_types = 0;
/*
* Extract and convert all the keys
*/
len = strlen(layout);
for(i = 0; i < len; ++i) {
/*
* L1 : L1 Cache
* L2 : L2 Cache
* L3 : L3 Cache
*/
if( layout[i] == 'L' ) {
param[0] = layout[i];
++i;
/*
* Check for 2 characters
*/
if( i >= len ) {
opal_output(0, "mca:rmaps:lama: Error: Cache Level must be followed by a number [%s]!",
layout);
exit_status = ORTE_ERROR;
goto cleanup;
}
param[1] = layout[i];
param[2] = '\0';
}
/*
* n : Machine
* b : Board
* s : Socket
* c : Core
* h : Hardware Thread
* N : NUMA Node
*/
else {
param[0] = layout[i];
param[1] = '\0';
}
/*
* Append level
*/
*num_types += 1;
*layout_types = (rmaps_lama_level_type_t*)realloc(*layout_types, sizeof(rmaps_lama_level_type_t) * (*num_types));
(*layout_types)[(*num_types)-1] = lama_type_str_to_enum(param);
}
/*
* Check for duplicates and unknowns
* Copy to sorted list
*/
*layout_types_sorted = (rmaps_lama_level_type_t*)malloc(sizeof(rmaps_lama_level_type_t) * (*num_types));
for( i = 0; i < *num_types; ++i ) {
/*
* Copy for later sorting
*/
(*layout_types_sorted)[i] = (*layout_types)[i];
/*
* Look for unknown and unsupported options
*/
if( LAMA_LEVEL_UNKNOWN <= (*layout_types)[i] ) {
opal_output(0, "mca:rmaps:lama: Error: Unknown or Unsupported option in layout [%s] position %d!",
layout, i+1);
exit_status = ORTE_ERROR;
goto cleanup;
}
if( LAMA_LEVEL_MACHINE == (*layout_types)[i] ) {
found_req_param_n = true;
}
if( LAMA_LEVEL_PU == (*layout_types)[i] ) {
found_req_param_h = true;
}
if( lama_binding_level == (*layout_types)[i] ) {
found_req_param_bind = true;
}
/*
* Look for duplicates
*/
for( j = i+1; j < *num_types; ++j ) {
if( (*layout_types)[i] == (*layout_types)[j] ) {
opal_output(0, "mca:rmaps:lama: Error: Duplicate key detected in layout [%s] position %d and %d!",
layout, i+1, j+1);
exit_status = ORTE_ERROR;
goto cleanup;
}
}
}
/*
* The user is required to specify at least the:
* - machine
* - hardware thread (needed for lower bound binding) JJH: We should be able to lift this...
* - binding layer (need it to stride the mapping)
*/
if( !found_req_param_n ) {
opal_output(0, "mca:rmaps:lama: Error: Required level not specified 'n' in layout [%s]!",
layout);
exit_status = ORTE_ERROR;
goto cleanup;
}
if( !found_req_param_h ) {
opal_output(0, "mca:rmaps:lama: Error: Required level not specified 'h' in layout [%s]!",
layout);
exit_status = ORTE_ERROR;
goto cleanup;
}
if( !found_req_param_bind ) {
opal_output(0, "mca:rmaps:lama: Error: Required binding level [%s] not specified in mapping layout [%s]!",
rmaps_lama_cmd_bind, layout);
exit_status = ORTE_ERROR;
goto cleanup;
}
/*
* Sort the items
*/
qsort((*layout_types_sorted ), (*num_types), sizeof(int), lama_parse_int_sort);
cleanup:
return exit_status;
}
int rmaps_lama_parse_binding(char *layout, rmaps_lama_level_type_t *binding_level, int *num_types)
{
int exit_status = ORTE_SUCCESS;
char param[3];
char num[MAX_BIND_DIGIT_LEN];
int i, n, p, len;
/*
* Default: If nothing specified
* - Bind to machine
*/
if( NULL == layout ) {
*binding_level = LAMA_LEVEL_MACHINE;
*num_types = 1;
return ORTE_SUCCESS;
}
*num_types = 0;
/*
* Extract and convert all the keys
*/
len = strlen(layout);
n = 0;
p = 0;
for(i = 0; i < len; ++i) {
/*
* Must start with a digit
*/
if( isdigit(layout[i]) ) {
/*
* Check: Digits must come first
*/
if( p != 0 ) {
opal_output(0, "mca:rmaps:lama: Error: Binding: Digits must only come before Level string [%s]!",
layout);
exit_status = ORTE_ERROR;
goto cleanup;
}
num[n] = layout[i];
++n;
/*
* Check: Exceed bound of number of digits
*/
if( n >= MAX_BIND_DIGIT_LEN ) {
opal_output(0, "mca:rmaps:lama: Error: Binding: Too many digits in [%s]! Limit %d",
layout, MAX_BIND_DIGIT_LEN-1);
exit_status = ORTE_ERROR;
goto cleanup;
}
}
/*
* Extract the level
*/
else {
/*
* Check: Digits must come first
*/
if( n == 0 ) {
opal_output(0, "mca:rmaps:lama: Error: Binding: Digits must come before Level string [%s] [%c]!",
layout, layout[i]);
exit_status = ORTE_ERROR;
goto cleanup;
}
/*
* Check: Only one level allowed
*/
if( p != 0 ) {
opal_output(0, "mca:rmaps:lama: Error: Binding: Only one level may be specified [%s]!",
layout);
exit_status = ORTE_ERROR;
goto cleanup;
}
/*
* L1 : L1 Cache
* L2 : L2 Cache
* L3 : L3 Cache
*/
if( layout[i] == 'L' ) {
param[0] = layout[i];
++i;
/*
* Check for 2 characters
*/
if( i >= len ) {
opal_output(0, "mca:rmaps:lama: Error: Cache Level must be followed by a number [%s]!",
layout);
exit_status = ORTE_ERROR;
goto cleanup;
}
param[1] = layout[i];
p = 2;
}
/*
* n : Machine
* b : Board
* s : Socket
* c : Core
* h : Hardware Thread
* N : NUMA Node
*/
else {
param[0] = layout[i];
p = 1;
}
param[p] = '\0';
}
}
/*
* Check that the level was specified
*/
if( p == 0 ) {
opal_output(0, "mca:rmaps:lama: Error: Binding: Level not specified [%s]!",
layout);
exit_status = ORTE_ERROR;
goto cleanup;
}
num[n] = '\0';
*binding_level = lama_type_str_to_enum(param);
*num_types = atoi(num);
/*
* Check for unknown level
*/
if( LAMA_LEVEL_UNKNOWN <= *binding_level ) {
opal_output(0, "mca:rmaps:lama: Error: Unknown or Unsupported option in layout [%s]!",
layout);
exit_status = ORTE_ERROR;
goto cleanup;
}
cleanup:
return exit_status;
}
int rmaps_lama_parse_mppr(char *layout, rmaps_lama_level_info_t **mppr_levels, int *num_types)
{
int exit_status = ORTE_SUCCESS;
char param[3];
char num[MAX_BIND_DIGIT_LEN];
char **argv = NULL;
int argc = 0;
int i, j, len;
int p, n;
/*
* Default: Unrestricted allocation
* 'oversubscribe' flag accounted for elsewhere
*/
if( NULL == layout ) {
*mppr_levels = NULL;
*num_types = 0;
return ORTE_SUCCESS;
}
*num_types = 0;
/*
* Split by ','
* <#:level>,<#:level>,...
*/
argv = opal_argv_split(layout, ',');
argc = opal_argv_count(argv);
for(j = 0; j < argc; ++j) {
/*
* Parse <#:level>
*/
len = strlen(argv[j]);
n = 0;
p = 0;
for(i = 0; i < len; ++i) {
/*
* Skip the ':' separator and whitespace
*/
if( argv[j][i] == ':' || isblank(argv[j][i])) {
continue;
}
/*
* Must start with a digit
*/
else if( isdigit(argv[j][i]) ) {
/*
* Check: Digits must come first
*/
if( p != 0 ) {
opal_output(0, "mca:rmaps:lama: Error: MPPR: Digits must only come before Level string [%s] at [%s]!",
layout, argv[j]);
exit_status = ORTE_ERROR;
goto cleanup;
}
num[n] = argv[j][i];
++n;
/*
* Check: Exceed bound of number of digits
*/
if( n >= MAX_BIND_DIGIT_LEN ) {
opal_output(0, "mca:rmaps:lama: Error: MPPR: Too many digits in [%s]! Limit %d",
argv[j], MAX_BIND_DIGIT_LEN-1);
exit_status = ORTE_ERROR;
goto cleanup;
}
}
/*
* Extract the level
*/
else {
/*
* Check: Digits must come first
*/
if( n == 0 ) {
opal_output(0, "mca:rmaps:lama: Error: MPPR: Digits must come before Level string [%s]!",
argv[j]);
exit_status = ORTE_ERROR;
goto cleanup;
}
/*
* Check: Only one level allowed
*/
if( p != 0 ) {
opal_output(0, "mca:rmaps:lama: Error: MPPR: Only one level may be specified [%s]!",
argv[j]);
exit_status = ORTE_ERROR;
goto cleanup;
}
/*
* L1 : L1 Cache
* L2 : L2 Cache
* L3 : L3 Cache
*/
if( argv[j][i] == 'L' ) {
param[0] = argv[j][i];
++i;
/*
* Check for 2 characters
*/
if( i >= len ) {
opal_output(0, "mca:rmaps:lama: Error: MPPR: Cache Level must be followed by a number [%s]!",
argv[j]);
exit_status = ORTE_ERROR;
goto cleanup;
}
param[1] = argv[j][i];
p = 2;
}
/*
* n : Machine
* b : Board
* s : Socket
* c : Core
* h : Hardware Thread
* N : NUMA Node
*/
else {
param[0] = argv[j][i];
p = 1;
}
param[p] = '\0';
}
}
/*
* Whitespace, just skip
*/
if( n == 0 && p == 0 ) {
continue;
}
/*
* Check that the level was specified
*/
if( p == 0 ) {
opal_output(0, "mca:rmaps:lama: Error: MPPR: Level not specified [%s]!",
layout);
exit_status = ORTE_ERROR;
goto cleanup;
}
num[n] = '\0';
/*
* Append level
*/
*num_types += 1;
*mppr_levels = (rmaps_lama_level_info_t*)realloc(*mppr_levels, sizeof(rmaps_lama_level_info_t) * (*num_types));
(*mppr_levels)[(*num_types)-1].type = lama_type_str_to_enum(param);
(*mppr_levels)[(*num_types)-1].max_resources = atoi(num);
}
/*
* Check for duplicates and unknowns
*/
for( i = 0; i < *num_types; ++i ) {
/*
* Look for unknown and unsupported options
*/
if( LAMA_LEVEL_UNKNOWN <= (*mppr_levels)[i].type ) {
opal_output(0, "mca:rmaps:lama: Error: Unknown or Unsupported option in layout [%s] position %d!",
layout, i+1);
exit_status = ORTE_ERROR;
goto cleanup;
}
/*
* Look for duplicates
*/
for( j = i+1; j < *num_types; ++j ) {
if( (*mppr_levels)[i].type == (*mppr_levels)[j].type ) {
opal_output(0, "mca:rmaps:lama: Error: Duplicate key detected in layout [%s] position %d and %d!",
layout, i+1, j+1);
exit_status = ORTE_ERROR;
goto cleanup;
}
}
}
cleanup:
if( NULL != argv ) {
opal_argv_free(argv);
argv = NULL;
}
return exit_status;
}
int rmaps_lama_parse_ordering(char *layout,
rmaps_lama_order_type_t *order)
{
/*
* Default: Natural ordering
*/
if( NULL == layout ) {
*order = LAMA_ORDER_NATURAL;
return ORTE_SUCCESS;
}
/*
* Sequential Ordering
*/
if( 0 == strncmp(layout, "s", strlen("s")) ||
0 == strncmp(layout, "S", strlen("S")) ) {
*order = LAMA_ORDER_SEQ;
}
/*
* Natural Ordering
*/
else if( 0 == strncmp(layout, "n", strlen("n")) ||
0 == strncmp(layout, "N", strlen("N")) ) {
*order = LAMA_ORDER_NATURAL;
}
/*
* Check for unknown options
*/
else {
opal_output(0, "mca:rmaps:lama: Error: Unknown or Unsupported option in ordering [%s]!",
layout);
return ORTE_ERROR;
}
return ORTE_SUCCESS;
}
bool rmaps_lama_ok_to_prune_level(rmaps_lama_order_type_t level)
{
int i;
for( i = 0; i < lama_mapping_num_layouts; ++i ) {
if( level == lama_mapping_layout[i] ) {
return false;
}
}
return true;
}
/*********************************
* Support Functions
*********************************/
static int lama_parse_int_sort(const void *a, const void *b) {
int left = *((int*)a);
int right = *((int*)b);
if( left < right ) {
return -1;
}
else if( left > right ) {
return 1;
}
else {
return 0;
}
}