1
1

Move from the use of regex to compression

We've been fighting the battle of trying to create a regex generator and
parser that can handle arbitrary hostname schemes - without long-term
success. The worst of it is that there is no way of checking to see if
the computed regex is correct short of parsing it and doing a
character-by-character comparison with the original string. Ugh...there
has to be a better solution.

One option is to investigate using 3rd-party regex libraries as
those are coming from communities whose sole focus is resolving that
problem. However, someone would need to spend the time to investigate
it, and we'd have to find a license-friendly implementation.

Another option is to quit beating our heads against the wall and just
compress the information. It won't be as much of a reduction, but we
also won't keep hitting scenarios where things break. In this case, it
seems that "perfection" is definitely the enemy of "good enough".

This PR implements the compression option while retaining the
possibility of people adding regex-generating components. The
compression code used in ORTE is consolidated into the opal/compress
framework. That framework currently held bzip and gzip components for
use in compressing checkpoint files - since we no longer support C/R, I
have .opal_ignore'd those components.

However, I have left the original framework APIs alone in case someone
ever decides to redo C/R. The APIs of interest here are added to the
framework - specifically, the "compress_block" and "decompress_block"
functions. I then moved the ORTE zlib compression code into a new
component in this framework.

Unfortunately, the framework currently is a single-select one - i.e.,
only one active component at a time. Since I .opal_ignore'd the other
two and made the priority of zlib high, this isn't a problem. However,
if someone wants to re-enable bzip/gzip or add another component, they
might need to transition opal/compress to a multi-select framework.

Included changes:

* Consolidate the compression code into the opal/compress framework

* Move the ORTE zlib compression code into a new opal/compress/zlib
  component

* Ignore the bzip and gzip components in opal/compress framework

* Add a "compress_base_limit" MCA param to set the threshold above which
  we compress data - defaults to 4096 bytes

* Delete stale brucks and rcd components from orte/grpcomm framework

* Delete the orte/regx framework

* Update the launch system to use opal/compress instead of string regex

* Provide a default module if no zlib is available

* Fix some misc multi-node issues

* Properly generate the nidmap in response to a "connection warmup"
  message so the remote daemon knows the children it needs to launch.

* Remove stale references to orte_node_regex

* opal_byte_object_t's are not OPAL objects - properly release allocated
  memory.

* Set the topology

* Currently only handling homogeneous case

* Update the compress framework files to conform

* Consolidate open/close into one "frame" file. Ensure we open/close the
  framework

Signed-off-by: Ralph Castain <rhc@pmix.org>
Этот коммит содержится в:
Ralph Castain 2019-01-29 16:02:21 -08:00
родитель fcbc7ea298
Коммит 125d236173
64 изменённых файлов: 1609 добавлений и 4213 удалений

Просмотреть файл

@ -18,6 +18,8 @@
* Copyright (c) 2018 Amazon.com, Inc. or its affiliates. All Rights reserved.
* Copyright (c) 2019 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* Copyright (c) 2018 Triad National Security, LLC. All rights
* reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow

Просмотреть файл

@ -3,6 +3,7 @@
# University Research and Technology
# Corporation. All rights reserved.
# Copyright (c) 2014 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2019 Intel, Inc. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
@ -14,7 +15,6 @@ headers += \
base/base.h
libmca_compress_la_SOURCES += \
base/compress_base_open.c \
base/compress_base_close.c \
base/compress_base_frame.c \
base/compress_base_select.c \
base/compress_base_fns.c

Просмотреть файл

@ -3,6 +3,7 @@
* University Research and Technology
* Corporation. All rights reserved.
*
* Copyright (c) 2019 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@ -27,6 +28,12 @@
extern "C" {
#endif
typedef struct {
size_t compress_limit;
} opal_compress_base_t;
OPAL_DECLSPEC extern opal_compress_base_t opal_compress_base;
/**
* Initialize the COMPRESS MCA framework
*

Просмотреть файл

@ -1,36 +0,0 @@
/*
* Copyright (c) 2004-2010 The Trustees of Indiana University.
* All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#include "opal_config.h"
#include <string.h>
#include "opal/mca/mca.h"
#include "opal/mca/base/base.h"
#include "opal/include/opal/constants.h"
#include "opal/mca/compress/compress.h"
#include "opal/mca/compress/base/base.h"
int opal_compress_base_close(void)
{
/* Compression currently only used with C/R */
if( !opal_cr_is_enabled ) {
opal_output_verbose(10, opal_compress_base_framework.framework_output,
"compress:open: FT is not enabled, skipping!");
return OPAL_SUCCESS;
}
/* Call the component's finalize routine */
if( NULL != opal_compress.finalize ) {
opal_compress.finalize();
}
/* Close all available modules that are open */
return mca_base_framework_components_close (&opal_compress_base_framework, NULL);
}

Просмотреть файл

@ -6,6 +6,7 @@
* All rights reserved.
* Copyright (c) 2015 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* Copyright (c) 2019 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@ -23,14 +24,31 @@
/*
* Globals
*/
static bool compress_block(uint8_t *inbytes,
size_t inlen,
uint8_t **outbytes,
size_t *olen)
{
return false;
}
static bool decompress_block(uint8_t **outbytes, size_t olen,
uint8_t *inbytes, size_t len)
{
return false;
}
opal_compress_base_module_t opal_compress = {
NULL, /* init */
NULL, /* finalize */
NULL, /* compress */
NULL, /* compress_nb */
NULL, /* decompress */
NULL /* decompress_nb */
NULL, /* decompress_nb */
compress_block,
decompress_block
};
opal_compress_base_t opal_compress_base = {0};
opal_compress_base_component_t opal_compress_base_selected_component = {{0}};
@ -42,6 +60,12 @@ MCA_BASE_FRAMEWORK_DECLARE(opal, compress, "COMPRESS MCA",
static int opal_compress_base_register(mca_base_register_flag_t flags)
{
opal_compress_base.compress_limit = 4096;
(void) mca_base_var_register("opal", "compress", "base", "limit",
"Threshold beyond which data will be compressed",
MCA_BASE_VAR_TYPE_SIZE_T, NULL, 0, 0, OPAL_INFO_LVL_3,
MCA_BASE_VAR_SCOPE_READONLY, &opal_compress_base.compress_limit);
return OPAL_SUCCESS;
}
@ -51,13 +75,17 @@ static int opal_compress_base_register(mca_base_register_flag_t flags)
*/
int opal_compress_base_open(mca_base_open_flag_t flags)
{
/* Compression currently only used with C/R */
if(!opal_cr_is_enabled) {
opal_output_verbose(10, opal_compress_base_framework.framework_output,
"compress:open: FT is not enabled, skipping!");
return OPAL_SUCCESS;
}
/* Open up all available components */
return mca_base_framework_components_open(&opal_compress_base_framework, flags);
}
int opal_compress_base_close(void)
{
/* Call the component's finalize routine */
if( NULL != opal_compress.finalize ) {
opal_compress.finalize();
}
/* Close all available modules that are open */
return mca_base_framework_components_close (&opal_compress_base_framework, NULL);
}

Просмотреть файл

@ -7,6 +7,7 @@
*
* Copyright (c) 2015 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2019 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@ -29,17 +30,10 @@
int opal_compress_base_select(void)
{
int ret, exit_status = OPAL_SUCCESS;
int ret = OPAL_SUCCESS;
opal_compress_base_component_t *best_component = NULL;
opal_compress_base_module_t *best_module = NULL;
/* Compression currently only used with C/R */
if( !opal_cr_is_enabled ) {
opal_output_verbose(10, opal_compress_base_framework.framework_output,
"compress:open: FT is not enabled, skipping!");
return OPAL_SUCCESS;
}
/*
* Select the best component
*/
@ -47,8 +41,8 @@ int opal_compress_base_select(void)
&opal_compress_base_framework.framework_components,
(mca_base_module_t **) &best_module,
(mca_base_component_t **) &best_component, NULL) ) {
/* This will only happen if no component was selected */
exit_status = OPAL_ERROR;
/* This will only happen if no component was selected,
* in which case we use the default one */
goto cleanup;
}
@ -58,12 +52,11 @@ int opal_compress_base_select(void)
/* Initialize the winner */
if (NULL != best_module) {
if (OPAL_SUCCESS != (ret = best_module->init()) ) {
exit_status = ret;
goto cleanup;
}
opal_compress = *best_module;
}
cleanup:
return exit_status;
return ret;
}

Просмотреть файл

@ -4,6 +4,7 @@
* All rights reserved.
* Copyright (c) 2015 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2019 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@ -65,22 +66,39 @@ opal_compress_bzip_component_t mca_compress_bzip_component = {
}
};
static bool nocompress(uint8_t *inbytes,
size_t inlen,
uint8_t **outbytes,
size_t *olen)
{
return false;
}
static bool nodecompress(uint8_t **outbytes, size_t olen,
uint8_t *inbytes, size_t len)
{
return false;
}
/*
* Bzip module
*/
static opal_compress_base_module_t loc_module = {
/** Initialization Function */
opal_compress_bzip_module_init,
.init = opal_compress_bzip_module_init,
/** Finalization Function */
opal_compress_bzip_module_finalize,
.finalize = opal_compress_bzip_module_finalize,
/** Compress Function */
opal_compress_bzip_compress,
opal_compress_bzip_compress_nb,
.compress = opal_compress_bzip_compress,
.compress_nb = opal_compress_bzip_compress_nb,
/** Decompress Function */
opal_compress_bzip_decompress,
opal_compress_bzip_decompress_nb
.decompress = opal_compress_bzip_decompress,
.decompress_nb = opal_compress_bzip_decompress_nb,
.compress_block = nocompress,
.decompress_block = nodecompress
};
static int compress_bzip_register (void)

Просмотреть файл

@ -6,6 +6,7 @@
* Copyright (c) 2015 Los Alamos National Security, LLC. All rights
* reserved.
*
* Copyright (c) 2019 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@ -82,6 +83,20 @@ typedef int (*opal_compress_base_module_decompress_fn_t)
typedef int (*opal_compress_base_module_decompress_nb_fn_t)
(char * cname, char **fname, pid_t *child_pid);
/**
* Compress a string
*
* Arguments:
*
*/
typedef bool (*opal_compress_base_module_compress_string_fn_t)(uint8_t *inbytes,
size_t inlen,
uint8_t **outbytes,
size_t *olen);
typedef bool (*opal_compress_base_module_decompress_string_fn_t)(uint8_t **outbytes, size_t olen,
uint8_t *inbytes, size_t len);
/**
* Structure for COMPRESS components.
*/
@ -117,6 +132,10 @@ struct opal_compress_base_module_1_0_0_t {
/** Decompress Interface */
opal_compress_base_module_decompress_fn_t decompress;
opal_compress_base_module_decompress_nb_fn_t decompress_nb;
/* COMPRESS STRING */
opal_compress_base_module_compress_string_fn_t compress_block;
opal_compress_base_module_decompress_string_fn_t decompress_block;
};
typedef struct opal_compress_base_module_1_0_0_t opal_compress_base_module_1_0_0_t;
typedef struct opal_compress_base_module_1_0_0_t opal_compress_base_module_t;

Просмотреть файл

Просмотреть файл

@ -4,6 +4,7 @@
* All rights reserved.
* Copyright (c) 2015 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2019 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@ -65,22 +66,39 @@ opal_compress_gzip_component_t mca_compress_gzip_component = {
}
};
static bool nocompress(uint8_t *inbytes,
size_t inlen,
uint8_t **outbytes,
size_t *olen)
{
return false;
}
static bool nodecompress(uint8_t **outbytes, size_t olen,
uint8_t *inbytes, size_t len)
{
return false;
}
/*
* Gzip module
*/
static opal_compress_base_module_t loc_module = {
/** Initialization Function */
opal_compress_gzip_module_init,
.init = opal_compress_gzip_module_init,
/** Finalization Function */
opal_compress_gzip_module_finalize,
.finalize = opal_compress_gzip_module_finalize,
/** Compress Function */
opal_compress_gzip_compress,
opal_compress_gzip_compress_nb,
.compress = opal_compress_gzip_compress,
.compress_nb = opal_compress_gzip_compress_nb,
/** Decompress Function */
opal_compress_gzip_decompress,
opal_compress_gzip_decompress_nb
.decompress = opal_compress_gzip_decompress,
.decompress_nb = opal_compress_gzip_decompress_nb,
.compress_block = nocompress,
.decompress_block = nodecompress
};
static int compress_gzip_register (void)

42
opal/mca/compress/zlib/Makefile.am Обычный файл
Просмотреть файл

@ -0,0 +1,42 @@
#
# Copyright (c) 2004-2010 The Trustees of Indiana University.
# All rights reserved.
# Copyright (c) 2014-2015 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2017 IBM Corporation. All rights reserved.
# Copyright (c) 2019 Intel, Inc. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
AM_CPPFLAGS = $(compress_zlib_CPPFLAGS)
sources = \
compress_zlib.h \
compress_zlib_component.c \
compress_zlib.c
# Make the output library in this directory, and name it either
# mca_<type>_<name>.la (for DSO builds) or libmca_<type>_<name>.la
# (for static builds).
if MCA_BUILD_opal_compress_zlib_DSO
component_noinst =
component_install = mca_compress_zlib.la
else
component_noinst = libmca_compress_zlib.la
component_install =
endif
mcacomponentdir = $(opallibdir)
mcacomponent_LTLIBRARIES = $(component_install)
mca_compress_zlib_la_SOURCES = $(sources)
mca_compress_zlib_la_LDFLAGS = -module -avoid-version $(compress_zlib_LDFLAGS)
mca_compress_zlib_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la $(compress_zlib_LIBS)
noinst_LTLIBRARIES = $(component_noinst)
libmca_compress_zlib_la_SOURCES = $(sources)
libmca_compress_zlib_la_LDFLAGS = -module -avoid-version $(compress_zlib_LDFLAGS)
libmca_compress_zlib_la_LIBADD = $(compress_zlib_LIBS)

133
opal/mca/compress/zlib/compress_zlib.c Обычный файл
Просмотреть файл

@ -0,0 +1,133 @@
/*
* Copyright (c) 2004-2010 The Trustees of Indiana University.
* All rights reserved.
* Copyright (c) 2010 Oracle and/or its affiliates. All rights reserved.
*
* Copyright (c) 2014 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2015 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* Copyright (c) 2018 Amazon.com, Inc. or its affiliates. All Rights reserved.
* Copyright (c) 2019 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#include "opal_config.h"
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/stat.h>
#if HAVE_UNISTD_H
#include <unistd.h>
#endif /* HAVE_UNISTD_H */
#include <zlib.h>
#include "opal/util/opal_environ.h"
#include "opal/util/output.h"
#include "opal/util/argv.h"
#include "opal/util/opal_environ.h"
#include "opal/util/printf.h"
#include "opal/constants.h"
#include "opal/util/basename.h"
#include "opal/mca/compress/compress.h"
#include "opal/mca/compress/base/base.h"
#include "compress_zlib.h"
int opal_compress_zlib_module_init(void)
{
return OPAL_SUCCESS;
}
int opal_compress_zlib_module_finalize(void)
{
return OPAL_SUCCESS;
}
bool opal_compress_zlib_compress_block(uint8_t *inbytes,
size_t inlen,
uint8_t **outbytes,
size_t *olen)
{
z_stream strm;
size_t len;
uint8_t *tmp;
if (inlen < opal_compress_base.compress_limit) {
return false;
}
opal_output_verbose(2, opal_compress_base_framework.framework_output,
"COMPRESSING");
/* set default output */
*outbytes = NULL;
*olen = 0;
/* setup the stream */
memset (&strm, 0, sizeof (strm));
deflateInit (&strm, 9);
/* get an upper bound on the required output storage */
len = deflateBound(&strm, inlen);
if (NULL == (tmp = (uint8_t*)malloc(len))) {
return false;
}
strm.next_in = inbytes;
strm.avail_in = inlen;
/* allocating the upper bound guarantees zlib will
* always successfully compress into the available space */
strm.avail_out = len;
strm.next_out = tmp;
deflate (&strm, Z_FINISH);
deflateEnd (&strm);
*outbytes = tmp;
*olen = len - strm.avail_out;
opal_output_verbose(2, opal_compress_base_framework.framework_output,
"\tINSIZE %d OUTSIZE %d", (int)inlen, (int)*olen);
return true; // we did the compression
}
bool opal_compress_zlib_uncompress_block(uint8_t **outbytes, size_t olen,
uint8_t *inbytes, size_t len)
{
uint8_t *dest;
z_stream strm;
/* set the default error answer */
*outbytes = NULL;
opal_output_verbose(2, opal_compress_base_framework.framework_output, "DECOMPRESS");
/* setting destination to the fully decompressed size */
dest = (uint8_t*)malloc(olen);
if (NULL == dest) {
return false;
}
memset (&strm, 0, sizeof (strm));
if (Z_OK != inflateInit(&strm)) {
free(dest);
return false;
}
strm.avail_in = len;
strm.next_in = inbytes;
strm.avail_out = olen;
strm.next_out = dest;
if (Z_STREAM_END != inflate (&strm, Z_FINISH)) {
opal_output(0, "\tDECOMPRESS FAILED: %s", strm.msg);
}
inflateEnd (&strm);
*outbytes = dest;
opal_output_verbose(2, opal_compress_base_framework.framework_output,
"\tINSIZE: %d OUTSIZE %d", (int)len, (int)olen);
return true;
}

66
opal/mca/compress/zlib/compress_zlib.h Обычный файл
Просмотреть файл

@ -0,0 +1,66 @@
/*
* Copyright (c) 2004-2010 The Trustees of Indiana University.
* All rights reserved.
* Copyright (c) 2019 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
/**
* @file
*
* ZLIB COMPRESS component
*
* Uses the zlib library
*/
#ifndef MCA_COMPRESS_ZLIB_EXPORT_H
#define MCA_COMPRESS_ZLIB_EXPORT_H
#include "opal_config.h"
#include "opal/util/output.h"
#include "opal/mca/mca.h"
#include "opal/mca/compress/compress.h"
#if defined(c_plusplus) || defined(__cplusplus)
extern "C" {
#endif
/*
* Local Component structures
*/
struct opal_compress_zlib_component_t {
opal_compress_base_component_t super; /** Base COMPRESS component */
};
typedef struct opal_compress_zlib_component_t opal_compress_zlib_component_t;
extern opal_compress_zlib_component_t mca_compress_zlib_component;
int opal_compress_zlib_component_query(mca_base_module_t **module, int *priority);
/*
* Module functions
*/
int opal_compress_zlib_module_init(void);
int opal_compress_zlib_module_finalize(void);
/*
* Actual funcationality
*/
bool opal_compress_zlib_compress_block(uint8_t *inbytes,
size_t inlen,
uint8_t **outbytes,
size_t *olen);
bool opal_compress_zlib_uncompress_block(uint8_t **outbytes, size_t olen,
uint8_t *inbytes, size_t len);
#if defined(c_plusplus) || defined(__cplusplus)
}
#endif
#endif /* MCA_COMPRESS_ZLIB_EXPORT_H */

Просмотреть файл

@ -0,0 +1,149 @@
/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */
/*
* Copyright (c) 2004-2010 The Trustees of Indiana University.
* All rights reserved.
* Copyright (c) 2015 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2019 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#include "opal_config.h"
#include "opal/constants.h"
#include "opal/mca/compress/compress.h"
#include "opal/mca/compress/base/base.h"
#include "compress_zlib.h"
/*
* Public string for version number
*/
const char *opal_compress_zlib_component_version_string =
"OPAL COMPRESS zlib MCA component version " OPAL_VERSION;
/*
* Local functionality
*/
static int compress_zlib_register (void);
static int compress_zlib_open(void);
static int compress_zlib_close(void);
/*
* Instantiate the public struct with all of our public information
* and pointer to our public functions in it
*/
opal_compress_zlib_component_t mca_compress_zlib_component = {
/* First do the base component stuff */
{
/* Handle the general mca_component_t struct containing
* meta information about the component itzlib
*/
.base_version = {
OPAL_COMPRESS_BASE_VERSION_2_0_0,
/* Component name and version */
.mca_component_name = "zlib",
MCA_BASE_MAKE_VERSION(component, OPAL_MAJOR_VERSION, OPAL_MINOR_VERSION,
OPAL_RELEASE_VERSION),
/* Component open and close functions */
.mca_open_component = compress_zlib_open,
.mca_close_component = compress_zlib_close,
.mca_query_component = opal_compress_zlib_component_query,
.mca_register_component_params = compress_zlib_register
},
.base_data = {
/* The component is checkpoint ready */
MCA_BASE_METADATA_PARAM_CHECKPOINT
},
.verbose = 0,
.output_handle = -1,
}
};
/*
* Zlib module
*/
static opal_compress_base_module_t loc_module = {
/** Initialization Function */
.init = opal_compress_zlib_module_init,
/** Finalization Function */
.finalize = opal_compress_zlib_module_finalize,
/** Compress Function */
.compress_block = opal_compress_zlib_compress_block,
/** Decompress Function */
.decompress_block = opal_compress_zlib_uncompress_block,
};
static int compress_zlib_register (void)
{
int ret;
mca_compress_zlib_component.super.priority = 50;
ret = mca_base_component_var_register (&mca_compress_zlib_component.super.base_version,
"priority", "Priority of the COMPRESS zlib component "
"(default: 50)", MCA_BASE_VAR_TYPE_INT, NULL, 0,
MCA_BASE_VAR_FLAG_SETTABLE,
OPAL_INFO_LVL_9, MCA_BASE_VAR_SCOPE_ALL_EQ,
&mca_compress_zlib_component.super.priority);
if (0 > ret) {
return ret;
}
mca_compress_zlib_component.super.verbose = 0;
ret = mca_base_component_var_register (&mca_compress_zlib_component.super.base_version,
"verbose",
"Verbose level for the COMPRESS zlib component",
MCA_BASE_VAR_TYPE_INT, NULL, 0, MCA_BASE_VAR_FLAG_SETTABLE,
OPAL_INFO_LVL_9, MCA_BASE_VAR_SCOPE_LOCAL,
&mca_compress_zlib_component.super.verbose);
return (0 > ret) ? ret : OPAL_SUCCESS;
}
static int compress_zlib_open(void)
{
/* If there is a custom verbose level for this component than use it
* otherwise take our parents level and output channel
*/
if ( 0 != mca_compress_zlib_component.super.verbose) {
mca_compress_zlib_component.super.output_handle = opal_output_open(NULL);
opal_output_set_verbosity(mca_compress_zlib_component.super.output_handle,
mca_compress_zlib_component.super.verbose);
} else {
mca_compress_zlib_component.super.output_handle = opal_compress_base_framework.framework_output;
}
/*
* Debug output
*/
opal_output_verbose(10, mca_compress_zlib_component.super.output_handle,
"compress:zlib: open()");
opal_output_verbose(20, mca_compress_zlib_component.super.output_handle,
"compress:zlib: open: priority = %d",
mca_compress_zlib_component.super.priority);
opal_output_verbose(20, mca_compress_zlib_component.super.output_handle,
"compress:zlib: open: verbosity = %d",
mca_compress_zlib_component.super.verbose);
return OPAL_SUCCESS;
}
static int compress_zlib_close(void)
{
return OPAL_SUCCESS;
}
int opal_compress_zlib_component_query(mca_base_module_t **module, int *priority)
{
*module = (mca_base_module_t *)&loc_module;
*priority = mca_compress_zlib_component.super.priority;
return OPAL_SUCCESS;
}

102
opal/mca/compress/zlib/configure.m4 Обычный файл
Просмотреть файл

@ -0,0 +1,102 @@
# -*- shell-script -*-
#
# Copyright (c) 2009-2015 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2013 Los Alamos National Security, LLC. All rights reserved.
# Copyright (c) 2013-2019 Intel, Inc. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
# MCA_compress_zlib_CONFIG([action-if-can-compile],
# [action-if-cant-compile])
# ------------------------------------------------
AC_DEFUN([MCA_opal_compress_zlib_CONFIG],[
AC_CONFIG_FILES([opal/mca/compress/zlib/Makefile])
OPAL_VAR_SCOPE_PUSH([opal_zlib_dir opal_zlib_libdir opal_zlib_standard_lib_location opal_zlib_standard_header_location opal_check_zlib_save_CPPFLAGS opal_check_zlib_save_LDFLAGS opal_check_zlib_save_LIBS])
AC_ARG_WITH([zlib],
[AC_HELP_STRING([--with-zlib=DIR],
[Search for zlib headers and libraries in DIR ])])
AC_ARG_WITH([zlib-libdir],
[AC_HELP_STRING([--with-zlib-libdir=DIR],
[Search for zlib libraries in DIR ])])
opal_check_zlib_save_CPPFLAGS="$CPPFLAGS"
opal_check_zlib_save_LDFLAGS="$LDFLAGS"
opal_check_zlib_save_LIBS="$LIBS"
opal_zlib_support=0
if test "$with_zlib" != "no"; then
AC_MSG_CHECKING([for zlib in])
if test ! -z "$with_zlib" && test "$with_zlib" != "yes"; then
opal_zlib_dir=$with_zlib
opal_zlib_source=$with_zlib
opal_zlib_standard_header_location=no
opal_zlib_standard_lib_location=no
AS_IF([test -z "$with_zlib_libdir" || test "$with_zlib_libdir" = "yes"],
[if test -d $with_zlib/lib; then
opal_zlib_libdir=$with_zlib/lib
elif test -d $with_zlib/lib64; then
opal_zlib_libdir=$with_zlib/lib64
else
AC_MSG_RESULT([Could not find $with_zlib/lib or $with_zlib/lib64])
AC_MSG_ERROR([Can not continue])
fi
AC_MSG_RESULT([$opal_zlib_dir and $opal_zlib_libdir])],
[AC_MSG_RESULT([$with_zlib_libdir])])
else
AC_MSG_RESULT([(default search paths)])
opal_zlib_source=standard
opal_zlib_standard_header_location=yes
opal_zlib_standard_lib_location=yes
fi
AS_IF([test ! -z "$with_zlib_libdir" && test "$with_zlib_libdir" != "yes"],
[opal_zlib_libdir="$with_zlib_libdir"
opal_zlib_standard_lib_location=no])
OPAL_CHECK_PACKAGE([opal_zlib],
[zlib.h],
[z],
[deflate],
[-lz],
[$opal_zlib_dir],
[$opal_zlib_libdir],
[opal_zlib_support=1],
[opal_zlib_support=0])
fi
if test ! -z "$with_zlib" && test "$with_zlib" != "no" && test "$opal_zlib_support" != "1"; then
AC_MSG_WARN([ZLIB SUPPORT REQUESTED AND NOT FOUND])
AC_MSG_ERROR([CANNOT CONTINUE])
fi
AC_MSG_CHECKING([will zlib support be built])
if test "$opal_zlib_support" != "1"; then
AC_MSG_RESULT([no])
else
AC_MSG_RESULT([yes])
fi
CPPFLAGS="$opal_check_zlib_save_CPPFLAGS"
LDFLAGS="$opal_check_zlib_save_LDFLAGS"
LIBS="$opal_check_zlib_save_LIBS"
AS_IF([test "$opal_zlib_support" = "1"],
[$1
OPAL_SUMMARY_ADD([[External Packages]],[[ZLIB]], [opal_zlib], [yes ($opal_zlib_source)])],
[$2])
# substitute in the things needed to build psm2
AC_SUBST([compress_zlib_CFLAGS])
AC_SUBST([compress_zlib_CPPFLAGS])
AC_SUBST([compress_zlib_LDFLAGS])
AC_SUBST([compress_zlib_LIBS])
OPAL_VAR_SCOPE_POP
])dnl

Просмотреть файл

@ -3,5 +3,5 @@
# owner: institution that is responsible for this package
# status: e.g. active, maintenance, unmaintained
#
owner: INTEL
status: maintenance
owner:project
status:maintenance

Просмотреть файл

@ -15,7 +15,7 @@
* Copyright (c) 2009 Oak Ridge National Labs. All rights reserved.
* Copyright (c) 2010-2015 Los Alamos National Security, LLC.
* All rights reserved.
* Copyright (c) 2013-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2013-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2015-2017 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* Copyright (c) 2017 Amazon.com, Inc. or its affiliates.
@ -61,9 +61,7 @@
#include "opal/mca/if/base/base.h"
#include "opal/dss/dss.h"
#include "opal/mca/shmem/base/base.h"
#if OPAL_ENABLE_FT_CR == 1
#include "opal/mca/compress/base/base.h"
#endif
#include "opal/threads/threads.h"
#include "opal/threads/tsd.h"
@ -524,7 +522,8 @@ opal_init_util(int* pargc, char*** pargv)
static mca_base_framework_t *opal_init_frameworks[] = {
&opal_hwloc_base_framework, &opal_memcpy_base_framework, &opal_memchecker_base_framework,
&opal_backtrace_base_framework, &opal_timer_base_framework, &opal_event_base_framework,
&opal_shmem_base_framework, &opal_reachable_base_framework, NULL,
&opal_shmem_base_framework, &opal_reachable_base_framework, &opal_compress_base_framework,
NULL,
};
int
@ -585,5 +584,12 @@ opal_init(int* pargc, char*** pargv)
return opal_init_error ("opal_reachable_base_select", ret);
}
/* Intitialize compress framework */
if (OPAL_SUCCESS != (ret = opal_compress_base_select())) {
return opal_init_error ("opal_compress_base_select", ret);
}
opal_finalize_pop_domain ();
return OPAL_SUCCESS;
}

Просмотреть файл

@ -8,7 +8,7 @@
* reserved.
* Copyright (c) 2011-2013 Los Alamos National Security, LLC.
* All rights reserved.
* Copyright (c) 2014-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2014-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2017 IBM Corporation. All rights reserved.
* $COPYRIGHT$
*
@ -427,56 +427,6 @@ static void proc_errors(int fd, short args, void *cbdata)
"%s errmgr:default:orted daemon %s exited",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
ORTE_NAME_PRINT(proc)));
/* if we are using static ports, then it is possible that the HNP
* will not see this termination. So if the HNP didn't order us
* to terminate, then we should ensure it knows */
if (orte_static_ports && !orte_orteds_term_ordered) {
/* send an alert to the HNP */
alert = OBJ_NEW(opal_buffer_t);
/* pack update state command */
cmd = ORTE_PLM_UPDATE_PROC_STATE;
if (ORTE_SUCCESS != (rc = opal_dss.pack(alert, &cmd, 1, ORTE_PLM_CMD))) {
ORTE_ERROR_LOG(rc);
return;
}
/* get the proc_t */
if (NULL == (child = (orte_proc_t*)opal_pointer_array_get_item(jdata->procs, proc->vpid))) {
ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
ORTE_FORCED_TERMINATE(ORTE_ERROR_DEFAULT_EXIT_CODE);
goto cleanup;
}
/* set the exit code to reflect the problem */
child->exit_code = ORTE_ERR_COMM_FAILURE;
/* pack only the data for this daemon - have to start with the jobid
* so the receiver can unpack it correctly
*/
if (ORTE_SUCCESS != (rc = opal_dss.pack(alert, &proc->jobid, 1, ORTE_JOBID))) {
ORTE_ERROR_LOG(rc);
return;
}
/* now pack the daemon's info */
if (ORTE_SUCCESS != (rc = pack_state_for_proc(alert, child))) {
ORTE_ERROR_LOG(rc);
return;
}
/* send it */
OPAL_OUTPUT_VERBOSE((5, orte_errmgr_base_framework.framework_output,
"%s errmgr:default_orted reporting lost connection to daemon %s",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
ORTE_NAME_PRINT(proc)));
if (0 > (rc = orte_rml.send_buffer_nb(orte_mgmt_conduit,
ORTE_PROC_MY_HNP, alert,
ORTE_RML_TAG_PLM,
orte_rml_send_callback, NULL))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(alert);
}
/* mark that we notified the HNP for this job so we don't do it again */
orte_set_attribute(&jdata->attributes, ORTE_JOB_FAIL_NOTIFIED, ORTE_ATTR_LOCAL, NULL, OPAL_BOOL);
/* continue on */
goto cleanup;
}
if (orte_orteds_term_ordered) {
/* are any of my children still alive */

Просмотреть файл

@ -2,7 +2,7 @@
* Copyright (c) 2004-2010 The Trustees of Indiana University and Indiana
* University Research and Technology
* Corporation. All rights reserved.
* Copyright (c) 2004-2018 The University of Tennessee and The University
* Copyright (c) 2004-2011 The University of Tennessee and The University
* of Tennessee Research Foundation. All rights
* reserved.
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
@ -12,9 +12,9 @@
* Copyright (c) 2009 Institut National de Recherche en Informatique
* et Automatique. All rights reserved.
* Copyright (c) 2011 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2011-2019 Los Alamos National Security, LLC. All rights
* Copyright (c) 2011-2013 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2013-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2013-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2017 IBM Corporation. All rights reserved.
* $COPYRIGHT$
*
@ -57,7 +57,6 @@
#include "orte/mca/iof/base/base.h"
#include "orte/mca/plm/base/base.h"
#include "orte/mca/odls/base/base.h"
#include "orte/mca/regx/base/base.h"
#include "orte/mca/errmgr/errmgr.h"
#include "orte/mca/rmaps/base/base.h"
#include "orte/mca/filem/base/base.h"
@ -515,17 +514,6 @@ int orte_ess_base_orted_setup(void)
error = "orte_rmaps_base_select";
goto error;
}
if (ORTE_SUCCESS != (ret = mca_base_framework_open(&orte_regx_base_framework, 0))) {
ORTE_ERROR_LOG(ret);
error = "orte_regx_base_open";
goto error;
}
if (ORTE_SUCCESS != (ret = orte_regx_base_select())) {
ORTE_ERROR_LOG(ret);
error = "orte_regx_base_select";
goto error;
}
/* if a topology file was given, then the rmaps framework open
* will have reset our topology. Ensure we always get the right
@ -542,46 +530,6 @@ int orte_ess_base_orted_setup(void)
opal_dss.dump(0, opal_hwloc_topology, OPAL_HWLOC_TOPO);
}
/* if we were given the host list, then we need to setup
* the daemon info so the RML can function properly
* without requiring a wireup stage. This must be done
* after we enable_comm as that function determines our
* own port, which we need in order to construct the nidmap
*/
if (NULL != orte_node_regex) {
if (ORTE_SUCCESS != (ret = orte_regx.nidmap_parse(orte_node_regex))) {
ORTE_ERROR_LOG(ret);
error = "construct nidmap";
goto error;
}
/* be sure to update the routing tree so any tree spawn operation
* properly gets the number of children underneath us */
orte_routed.update_routing_plan(NULL);
}
if (orte_static_ports || orte_fwd_mpirun_port) {
if (NULL == orte_node_regex) {
/* we didn't get the node info */
error = "cannot construct daemon map for static ports - no node map info";
goto error;
}
/* extract the node info from the environment and
* build a nidmap from it - this will update the
* routing plan as well
*/
if (ORTE_SUCCESS != (ret = orte_regx.build_daemon_nidmap())) {
ORTE_ERROR_LOG(ret);
error = "construct daemon map from static ports";
goto error;
}
/* be sure to update the routing tree so the initial "phone home"
* to mpirun goes through the tree if static ports were enabled
*/
orte_routed.update_routing_plan(NULL);
/* routing can be enabled */
orte_routed_base.routing_enabled = true;
}
/* Now provide a chance for the PLM
* to perform any module-specific init functions. This
* needs to occur AFTER the communications are setup
@ -669,20 +617,15 @@ int orte_ess_base_orted_finalize(void)
(void) mca_base_framework_close(&orte_filem_base_framework);
(void) mca_base_framework_close(&orte_grpcomm_base_framework);
(void) mca_base_framework_close(&orte_iof_base_framework);
/* first stage shutdown of the errmgr, deregister the handler but keep
* the required facilities until the rml and oob are offline */
orte_errmgr.finalize();
(void) mca_base_framework_close(&orte_errmgr_base_framework);
(void) mca_base_framework_close(&orte_plm_base_framework);
/* make sure our local procs are dead */
orte_odls.kill_local_procs(NULL);
(void) mca_base_framework_close(&orte_regx_base_framework);
(void) mca_base_framework_close(&orte_rmaps_base_framework);
(void) mca_base_framework_close(&orte_rtc_base_framework);
(void) mca_base_framework_close(&orte_odls_base_framework);
(void) mca_base_framework_close(&orte_routed_base_framework);
(void) mca_base_framework_close(&orte_rml_base_framework);
(void) mca_base_framework_close(&orte_oob_base_framework);
(void) mca_base_framework_close(&orte_errmgr_base_framework);
(void) mca_base_framework_close(&orte_state_base_framework);
/* remove our use of the session directory tree */
orte_session_dir_finalize(ORTE_PROC_MY_NAME);

Просмотреть файл

@ -14,7 +14,7 @@
* Copyright (c) 2011-2014 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2011-2017 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2013-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2013-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2017-2018 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
@ -66,7 +66,6 @@
#include "orte/mca/grpcomm/base/base.h"
#include "orte/mca/iof/base/base.h"
#include "orte/mca/ras/base/base.h"
#include "orte/mca/regx/base/base.h"
#include "orte/mca/plm/base/base.h"
#include "orte/mca/plm/plm.h"
#include "orte/mca/odls/base/base.h"
@ -555,16 +554,6 @@ static int rte_init(void)
error = "orte_rmaps_base_find_available";
goto error;
}
if (ORTE_SUCCESS != (ret = mca_base_framework_open(&orte_regx_base_framework, 0))) {
ORTE_ERROR_LOG(ret);
error = "orte_regx_base_open";
goto error;
}
if (ORTE_SUCCESS != (ret = orte_regx_base_select())) {
ORTE_ERROR_LOG(ret);
error = "orte_regx_base_select";
goto error;
}
/* if a topology file was given, then the rmaps framework open
* will have reset our topology. Ensure we always get the right

Просмотреть файл

@ -12,7 +12,7 @@
* All rights reserved.
* Copyright (c) 2011-2016 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2016-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2016-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2017 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
@ -33,7 +33,7 @@
#include "opal/dss/dss.h"
#include "orte/util/compress.h"
#include "opal/mca/compress/compress.h"
#include "orte/util/proc_info.h"
#include "orte/util/error_strings.h"
#include "orte/mca/errmgr/errmgr.h"
@ -506,8 +506,8 @@ static int pack_xcast(orte_grpcomm_signature_t *sig,
}
/* see if we want to compress this message */
if (orte_util_compress_block((uint8_t*)data.base_ptr, data.bytes_used,
&cmpdata, &cmplen)) {
if (opal_compress.compress_block((uint8_t*)data.base_ptr, data.bytes_used,
&cmpdata, &cmplen)) {
/* the data was compressed - mark that we compressed it */
flag = 1;
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &flag, 1, OPAL_INT8))) {

Просмотреть файл

@ -1,41 +0,0 @@
#
# Copyright (c) 2011 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2013 Los Alamos National Security, LLC. All rights
# reserved.
# Copyright (c) 2014-2018 Intel, Inc. All rights reserved.
# Copyright (c) 2017 IBM Corporation. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
AM_CPPFLAGS = $(grpcomm_brucks_CPPFLAGS)
sources = \
grpcomm_brucks.h \
grpcomm_brucks_module.c \
grpcomm_brucks_component.c
# Make the output library in this brucksory, and name it either
# mca_<type>_<name>.la (for DSO builds) or libmca_<type>_<name>.la
# (for static builds).
if MCA_BUILD_orte_grpcomm_brucks_DSO
component_noinst =
component_install = mca_grpcomm_brucks.la
else
component_noinst = libmca_grpcomm_brucks.la
component_install =
endif
mcacomponentdir = $(ortelibdir)
mcacomponent_LTLIBRARIES = $(component_install)
mca_grpcomm_brucks_la_SOURCES = $(sources)
mca_grpcomm_brucks_la_LDFLAGS = -module -avoid-version
mca_grpcomm_brucks_la_LIBADD = $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
noinst_LTLIBRARIES = $(component_noinst)
libmca_grpcomm_brucks_la_SOURCES =$(sources)
libmca_grpcomm_brucks_la_LDFLAGS = -module -avoid-version

Просмотреть файл

@ -1,31 +0,0 @@
/* -*- C -*-
*
* Copyright (c) 2011 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2014 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*
*/
#ifndef GRPCOMM_BRUCKS_H
#define GRPCOMM_BRUCKS_H
#include "orte_config.h"
#include "orte/mca/grpcomm/grpcomm.h"
BEGIN_C_DECLS
/*
* Grpcomm interfaces
*/
ORTE_MODULE_DECLSPEC extern orte_grpcomm_base_component_t mca_grpcomm_brucks_component;
extern orte_grpcomm_base_module_t orte_grpcomm_brucks_module;
END_C_DECLS
#endif

Просмотреть файл

@ -1,84 +0,0 @@
/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */
/*
* Copyright (c) 2011 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2011-2015 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2014 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#include "orte_config.h"
#include "orte/constants.h"
#include "orte/mca/mca.h"
#include "opal/runtime/opal_params.h"
#include "orte/util/proc_info.h"
#include "grpcomm_brucks.h"
static int my_priority=5;
static int brucks_open(void);
static int brucks_close(void);
static int brucks_query(mca_base_module_t **module, int *priority);
static int brucks_register(void);
/*
* Struct of function pointers that need to be initialized
*/
orte_grpcomm_base_component_t mca_grpcomm_brucks_component = {
.base_version = {
ORTE_GRPCOMM_BASE_VERSION_3_0_0,
.mca_component_name = "brucks",
MCA_BASE_MAKE_VERSION(component, ORTE_MAJOR_VERSION, ORTE_MINOR_VERSION,
ORTE_RELEASE_VERSION),
.mca_open_component = brucks_open,
.mca_close_component = brucks_close,
.mca_query_component = brucks_query,
.mca_register_component_params = brucks_register,
},
.base_data = {
/* The component is checkpoint ready */
MCA_BASE_METADATA_PARAM_CHECKPOINT
},
};
static int brucks_register(void)
{
mca_base_component_t *c = &mca_grpcomm_brucks_component.base_version;
/* make the priority adjustable so users can select
* brucks for use by apps without affecting daemons
*/
my_priority = 50;
(void) mca_base_component_var_register(c, "priority",
"Priority of the grpcomm brucks component",
MCA_BASE_VAR_TYPE_INT, NULL, 0, 0,
OPAL_INFO_LVL_9,
MCA_BASE_VAR_SCOPE_READONLY,
&my_priority);
return ORTE_SUCCESS;
}
/* Open the component */
static int brucks_open(void)
{
return ORTE_SUCCESS;
}
static int brucks_close(void)
{
return ORTE_SUCCESS;
}
static int brucks_query(mca_base_module_t **module, int *priority)
{
*priority = my_priority;
*module = (mca_base_module_t *)&orte_grpcomm_brucks_module;
return ORTE_SUCCESS;
}

Просмотреть файл

@ -1,388 +0,0 @@
/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */
/*
* Copyright (c) 2007 The Trustees of Indiana University.
* All rights reserved.
* Copyright (c) 2011-2015 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2011-2016 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2014-2015 Intel, Inc. All rights reserved.
* Copyright (c) 2014 Mellanox Technologies, Inc.
* All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#include "orte_config.h"
#include "orte/constants.h"
#include "orte/types.h"
#include "orte/runtime/orte_wait.h"
#include <math.h>
#include <string.h>
#include "opal/dss/dss.h"
#include "orte/mca/errmgr/errmgr.h"
#include "orte/mca/rml/rml.h"
#include "orte/util/name_fns.h"
#include "orte/util/proc_info.h"
#include "orte/mca/grpcomm/base/base.h"
#include "grpcomm_brucks.h"
/* Static API's */
static int init(void);
static void finalize(void);
static int allgather(orte_grpcomm_coll_t *coll,
opal_buffer_t *buf);
static void brucks_allgather_process_data(orte_grpcomm_coll_t *coll, uint32_t distance);
static int brucks_allgather_send_dist(orte_grpcomm_coll_t *coll, orte_process_name_t *peer, uint32_t distance);
static void brucks_allgather_recv_dist(int status, orte_process_name_t* sender,
opal_buffer_t* buffer, orte_rml_tag_t tag,
void* cbdata);
static int brucks_finalize_coll(orte_grpcomm_coll_t *coll, int ret);
/* Module def */
orte_grpcomm_base_module_t orte_grpcomm_brucks_module = {
init,
finalize,
NULL,
allgather
};
/**
* Initialize the module
*/
static int init(void)
{
/* setup recv for distance data */
orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD,
ORTE_RML_TAG_ALLGATHER_BRUCKS,
ORTE_RML_PERSISTENT,
brucks_allgather_recv_dist, NULL);
return OPAL_SUCCESS;
}
/**
* Finalize the module
*/
static void finalize(void)
{
/* cancel the recv */
orte_rml.recv_cancel(ORTE_NAME_WILDCARD, ORTE_RML_TAG_ALLGATHER_BRUCKS);
}
static int allgather(orte_grpcomm_coll_t *coll,
opal_buffer_t *sendbuf)
{
OPAL_OUTPUT_VERBOSE((5, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:brucks algo employed for %d processes",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), (int)coll->ndmns));
/* get my own rank */
coll->my_rank = ORTE_VPID_INVALID;
for (orte_vpid_t nv = 0; nv < coll->ndmns; nv++) {
if (coll->dmns[nv] == ORTE_PROC_MY_NAME->vpid) {
coll->my_rank = nv;
break;
}
}
/* check for bozo case */
if (ORTE_VPID_INVALID == coll->my_rank) {
OPAL_OUTPUT((orte_grpcomm_base_framework.framework_output,
"Peer not found"));
ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
brucks_finalize_coll(coll, ORTE_ERR_NOT_FOUND);
return ORTE_ERR_NOT_FOUND;
}
/* record that we contributed */
coll->nreported = 1;
/* mark local data received */
if (coll->ndmns > 1) {
opal_bitmap_init (&coll->distance_mask_recv, (uint32_t) log2 (coll->ndmns) + 1);
}
/* start by seeding the collection with our own data */
opal_dss.copy_payload(&coll->bucket, sendbuf);
/* process data */
brucks_allgather_process_data (coll, 0);
return ORTE_SUCCESS;
}
static int brucks_allgather_send_dist(orte_grpcomm_coll_t *coll, orte_process_name_t *peer, uint32_t distance) {
opal_buffer_t *send_buf;
int rc;
send_buf = OBJ_NEW(opal_buffer_t);
/* pack the signature */
if (OPAL_SUCCESS != (rc = opal_dss.pack(send_buf, &coll->sig, 1, ORTE_SIGNATURE))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(send_buf);
return rc;
}
/* pack the current distance */
if (OPAL_SUCCESS != (rc = opal_dss.pack(send_buf, &distance, 1, OPAL_INT32))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(send_buf);
return rc;
}
/* pack the number of daemons included in the payload */
if (OPAL_SUCCESS != (rc = opal_dss.pack(send_buf, &coll->nreported, 1, OPAL_SIZE))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(send_buf);
return rc;
}
/* pack the data */
if (OPAL_SUCCESS != (rc = opal_dss.copy_payload(send_buf, &coll->bucket))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(send_buf);
return rc;
}
OPAL_OUTPUT_VERBOSE((5, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:brucks SENDING TO %s",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
ORTE_NAME_PRINT(peer)));
if (0 > (rc = orte_rml.send_buffer_nb(peer, send_buf,
ORTE_RML_TAG_ALLGATHER_BRUCKS,
orte_rml_send_callback, NULL))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(send_buf);
return rc;
};
return ORTE_SUCCESS;
}
static int brucks_allgather_process_buffered (orte_grpcomm_coll_t *coll, uint32_t distance) {
opal_buffer_t *buffer;
size_t nreceived;
int32_t cnt = 1;
int rc;
/* check whether data for next distance is available*/
if (NULL == coll->buffers || NULL == coll->buffers[distance]) {
return 0;
}
buffer = coll->buffers[distance];
coll->buffers[distance] = NULL;
OPAL_OUTPUT_VERBOSE((80, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:brucks %u distance data found",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), distance));
rc = opal_dss.unpack (buffer, &nreceived, &cnt, OPAL_SIZE);
if (OPAL_SUCCESS != rc) {
ORTE_ERROR_LOG(rc);
brucks_finalize_coll(coll, rc);
return rc;
}
if (OPAL_SUCCESS != (rc = opal_dss.copy_payload(&coll->bucket, buffer))) {
ORTE_ERROR_LOG(rc);
brucks_finalize_coll(coll, rc);
return rc;
}
coll->nreported += nreceived;
orte_grpcomm_base_mark_distance_recv (coll, distance);
OBJ_RELEASE(buffer);
return 1;
}
static void brucks_allgather_process_data(orte_grpcomm_coll_t *coll, uint32_t distance) {
/* Communication step:
At every step i, rank r:
- doubles the distance
- sends message containing all data collected so far to rank r - distance
- receives message containing all data collected so far from rank (r + distance)
*/
uint32_t log2ndmns = (uint32_t) log2 (coll->ndmns);
uint32_t last_round;
orte_process_name_t peer;
orte_vpid_t nv;
int rc;
/* NTH: calculate in which round we should send the final data. this is the first
* round in which we have data from at least (coll->ndmns - (1 << log2ndmns))
* daemons. alternatively we could just send when distance reaches log2ndmns but
* that could end up sending more data than needed */
last_round = (uint32_t) ceil (log2 ((double) (coll->ndmns - (1 << log2ndmns))));
peer.jobid = ORTE_PROC_MY_NAME->jobid;
while (distance < log2ndmns) {
OPAL_OUTPUT_VERBOSE((80, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:brucks process distance %u)",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), distance));
/* first send my current contents */
nv = (coll->ndmns + coll->my_rank - (1 << distance)) % coll->ndmns;
peer.vpid = coll->dmns[nv];
brucks_allgather_send_dist(coll, &peer, distance);
if (distance == last_round) {
/* have enough data to send the final round now */
nv = (coll->ndmns + coll->my_rank - (1 << log2ndmns)) % coll->ndmns;
peer.vpid = coll->dmns[nv];
brucks_allgather_send_dist(coll, &peer, log2ndmns);
}
rc = brucks_allgather_process_buffered (coll, distance);
if (!rc) {
break;
} else if (rc < 0) {
return;
}
++distance;
}
if (distance == log2ndmns) {
if (distance == last_round) {
/* need to send the final round now */
nv = (coll->ndmns + coll->my_rank - (1 << log2ndmns)) % coll->ndmns;
peer.vpid = coll->dmns[nv];
brucks_allgather_send_dist(coll, &peer, log2ndmns);
}
/* check if the final message is already queued */
rc = brucks_allgather_process_buffered (coll, distance);
if (rc < 0) {
return;
}
}
OPAL_OUTPUT_VERBOSE((80, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:brucks reported %lu process from %lu",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), (unsigned long)coll->nreported,
(unsigned long)coll->ndmns));
/* if we are done, then complete things. we may get data from more daemons than expected */
if (coll->nreported >= coll->ndmns){
brucks_finalize_coll(coll, ORTE_SUCCESS);
}
}
static void brucks_allgather_recv_dist(int status, orte_process_name_t* sender,
opal_buffer_t* buffer, orte_rml_tag_t tag,
void* cbdata)
{
int32_t cnt;
int rc;
orte_grpcomm_signature_t *sig;
orte_grpcomm_coll_t *coll;
uint32_t distance;
OPAL_OUTPUT_VERBOSE((5, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:brucks RECEIVING FROM %s",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
ORTE_NAME_PRINT(sender)));
/* unpack the signature */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer, &sig, &cnt, ORTE_SIGNATURE))) {
ORTE_ERROR_LOG(rc);
return;
}
/* check for the tracker and create it if not found */
if (NULL == (coll = orte_grpcomm_base_get_tracker(sig, true))) {
ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
OBJ_RELEASE(sig);
return;
}
/* unpack the distance */
distance = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer, &distance, &cnt, OPAL_INT32))) {
OBJ_RELEASE(sig);
ORTE_ERROR_LOG(rc);
brucks_finalize_coll(coll, rc);
return;
}
assert(0 == orte_grpcomm_base_check_distance_recv(coll, distance));
/* Check whether we can process next distance */
if (coll->nreported && (!distance || orte_grpcomm_base_check_distance_recv(coll, distance - 1))) {
size_t nreceived;
OPAL_OUTPUT_VERBOSE((80, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:brucks data from %d distance received, "
"Process the next distance.",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), distance));
/* capture any provided content */
rc = opal_dss.unpack (buffer, &nreceived, &cnt, OPAL_SIZE);
if (OPAL_SUCCESS != rc) {
OBJ_RELEASE(sig);
ORTE_ERROR_LOG(rc);
brucks_finalize_coll(coll, rc);
return;
}
if (OPAL_SUCCESS != (rc = opal_dss.copy_payload(&coll->bucket, buffer))) {
OBJ_RELEASE(sig);
ORTE_ERROR_LOG(rc);
brucks_finalize_coll(coll, rc);
return;
}
coll->nreported += nreceived;
orte_grpcomm_base_mark_distance_recv(coll, distance);
brucks_allgather_process_data(coll, distance + 1);
} else {
OPAL_OUTPUT_VERBOSE((80, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:brucks data from %d distance received, "
"still waiting for data.",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), distance));
if (NULL == coll->buffers) {
if (NULL == (coll->buffers = (opal_buffer_t **) calloc ((uint32_t) log2 (coll->ndmns) + 1, sizeof(opal_buffer_t *)))) {
rc = OPAL_ERR_OUT_OF_RESOURCE;
OBJ_RELEASE(sig);
ORTE_ERROR_LOG(rc);
brucks_finalize_coll(coll, rc);
return;
}
}
if (NULL == (coll->buffers[distance] = OBJ_NEW(opal_buffer_t))) {
rc = OPAL_ERR_OUT_OF_RESOURCE;
OBJ_RELEASE(sig);
ORTE_ERROR_LOG(rc);
brucks_finalize_coll(coll, rc);
return;
}
if (OPAL_SUCCESS != (rc = opal_dss.copy_payload(coll->buffers[distance], buffer))) {
OBJ_RELEASE(sig);
ORTE_ERROR_LOG(rc);
brucks_finalize_coll(coll, rc);
return;
}
}
OBJ_RELEASE(sig);
}
static int brucks_finalize_coll(orte_grpcomm_coll_t *coll, int ret)
{
OPAL_OUTPUT_VERBOSE((5, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:brucks declared collective complete",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
/* execute the callback */
if (NULL != coll->cbfunc) {
coll->cbfunc(ret, &coll->bucket, coll->cbdata);
}
opal_list_remove_item(&orte_grpcomm_base.ongoing, &coll->super);
return ORTE_SUCCESS;
}

Просмотреть файл

@ -5,7 +5,7 @@
* Copyright (c) 2011 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2011-2013 Los Alamos National Security, LLC. All
* rights reserved.
* Copyright (c) 2014-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2014-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2014-2017 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
@ -24,15 +24,15 @@
#include "opal/dss/dss.h"
#include "opal/class/opal_list.h"
#include "opal/mca/pmix/pmix.h"
#include "opal/mca/compress/compress.h"
#include "orte/mca/errmgr/errmgr.h"
#include "orte/mca/regx/regx.h"
#include "orte/mca/rml/base/base.h"
#include "orte/mca/rml/base/rml_contact.h"
#include "orte/mca/routed/base/base.h"
#include "orte/mca/state/state.h"
#include "orte/util/compress.h"
#include "orte/util/name_fns.h"
#include "orte/util/nidmap.h"
#include "orte/util/proc_info.h"
#include "orte/mca/grpcomm/base/base.h"
@ -271,7 +271,7 @@ static void xcast_recv(int status, orte_process_name_t* sender,
opal_list_t coll;
orte_grpcomm_signature_t *sig;
orte_rml_tag_t tag;
char *rtmod, *nidmap;
char *rtmod;
size_t inlen, cmplen;
uint8_t *packed_data, *cmpdata;
int32_t nvals, i;
@ -336,7 +336,7 @@ static void xcast_recv(int status, orte_process_name_t* sender,
return;
}
/* decompress the data */
if (orte_util_uncompress_block(&cmpdata, cmplen,
if (opal_compress.decompress_block(&cmpdata, cmplen,
packed_data, inlen)) {
/* the data has been uncompressed */
opal_dss.load(&datbuf, cmpdata, cmplen);
@ -409,38 +409,17 @@ static void xcast_recv(int status, orte_process_name_t* sender,
ORTE_ERROR_LOG(ret);
goto relay;
}
/* unpack the nidmap string - may be NULL */
cnt = 1;
if (OPAL_SUCCESS != (ret = opal_dss.unpack(data, &nidmap, &cnt, OPAL_STRING))) {
ORTE_ERROR_LOG(ret);
goto relay;
}
if (NULL != nidmap) {
if (ORTE_SUCCESS != (ret = orte_regx.nidmap_parse(nidmap))) {
ORTE_ERROR_LOG(ret);
goto relay;
}
free(nidmap);
}
/* see if they included info on node capabilities */
/* unpack flag indicating if nidmap included */
cnt = 1;
if (OPAL_SUCCESS != (ret = opal_dss.unpack(data, &flag, &cnt, OPAL_INT8))) {
ORTE_ERROR_LOG(ret);
goto relay;
}
if (0 != flag) {
/* update our local nidmap, if required - the decode function
* knows what to do
*/
OPAL_OUTPUT_VERBOSE((5, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:direct:xcast updating daemon nidmap",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
if (ORTE_SUCCESS != (ret = orte_regx.decode_daemon_nodemap(data))) {
if (1 == flag) {
if (ORTE_SUCCESS != (ret = orte_util_decode_nidmap(data))) {
ORTE_ERROR_LOG(ret);
goto relay;
}
if (!ORTE_PROC_IS_HNP) {
/* update the routing plan - the HNP already did
* it when it computed the VM, so don't waste time
@ -450,7 +429,7 @@ static void xcast_recv(int status, orte_process_name_t* sender,
/* routing is now possible */
orte_routed_base.routing_enabled = true;
/* unpack the byte object */
/* unpack the wireup byte object */
cnt=1;
if (ORTE_SUCCESS != (ret = opal_dss.unpack(data, &bo, &cnt, OPAL_BYTE_OBJECT))) {
ORTE_ERROR_LOG(ret);

Просмотреть файл

@ -1,41 +0,0 @@
#
# Copyright (c) 2011 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2013 Los Alamos National Security, LLC. All rights
# reserved.
# Copyright (c) 2014-2018 Intel, Inc. All rights reserved.
# Copyright (c) 2017 IBM Corporation. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
AM_CPPFLAGS = $(grpcomm_rcd_CPPFLAGS)
sources = \
grpcomm_rcd.h \
grpcomm_rcd.c \
grpcomm_rcd_component.c
# Make the output library in this rcdory, and name it either
# mca_<type>_<name>.la (for DSO builds) or libmca_<type>_<name>.la
# (for static builds).
if MCA_BUILD_orte_grpcomm_rcd_DSO
component_noinst =
component_install = mca_grpcomm_rcd.la
else
component_noinst = libmca_grpcomm_rcd.la
component_install =
endif
mcacomponentdir = $(ortelibdir)
mcacomponent_LTLIBRARIES = $(component_install)
mca_grpcomm_rcd_la_SOURCES = $(sources)
mca_grpcomm_rcd_la_LDFLAGS = -module -avoid-version
mca_grpcomm_rcd_la_LIBADD = $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
noinst_LTLIBRARIES = $(component_noinst)
libmca_grpcomm_rcd_la_SOURCES =$(sources)
libmca_grpcomm_rcd_la_LDFLAGS = -module -avoid-version

Просмотреть файл

@ -1,329 +0,0 @@
/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */
/*
* Copyright (c) 2007 The Trustees of Indiana University.
* All rights reserved.
* Copyright (c) 2011-2015 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2011-2016 Los Alamos National Security, LLC. All
* rights reserved.
* Copyright (c) 2014-2016 Intel, Inc. All rights reserved.
* Copyright (c) 2014 Mellanox Technologies, Inc.
* All rights reserved.
* Copyright (c) 2014 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#include "orte_config.h"
#include "orte/constants.h"
#include "orte/types.h"
#include "orte/runtime/orte_wait.h"
#include <math.h>
#include <string.h>
#include "opal/dss/dss.h"
#include "orte/mca/errmgr/errmgr.h"
#include "orte/mca/rml/rml.h"
#include "orte/util/name_fns.h"
#include "orte/util/proc_info.h"
#include "orte/mca/grpcomm/base/base.h"
#include "grpcomm_rcd.h"
/* Static API's */
static int init(void);
static void finalize(void);
static int allgather(orte_grpcomm_coll_t *coll,
opal_buffer_t *buf);
static void rcd_allgather_process_data(orte_grpcomm_coll_t *coll, uint32_t distance);
static int rcd_allgather_send_dist(orte_grpcomm_coll_t *coll, orte_process_name_t *peer, uint32_t distance);
static void rcd_allgather_recv_dist(int status, orte_process_name_t* sender,
opal_buffer_t* buffer, orte_rml_tag_t tag,
void* cbdata);
static int rcd_finalize_coll(orte_grpcomm_coll_t *coll, int ret);
/* Module def */
orte_grpcomm_base_module_t orte_grpcomm_rcd_module = {
init,
finalize,
NULL,
allgather
};
/**
* Initialize the module
*/
static int init(void)
{
/* setup recv for distance data */
orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD,
ORTE_RML_TAG_ALLGATHER_RCD,
ORTE_RML_PERSISTENT,
rcd_allgather_recv_dist, NULL);
return OPAL_SUCCESS;
}
/**
* Finalize the module
*/
static void finalize(void)
{
/* cancel the recv */
orte_rml.recv_cancel(ORTE_NAME_WILDCARD, ORTE_RML_TAG_ALLGATHER_RCD);
}
static int allgather(orte_grpcomm_coll_t *coll,
opal_buffer_t *sendbuf)
{
uint32_t log2ndmns;
/* check the number of involved daemons - if it is not a power of two,
* then we cannot do it */
if (0 == ((coll->ndmns != 0) && !(coll->ndmns & (coll->ndmns - 1)))) {
return ORTE_ERR_TAKE_NEXT_OPTION;
}
log2ndmns = log2 (coll->ndmns);
OPAL_OUTPUT_VERBOSE((5, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:recdub algo employed for %d daemons",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), (int)coll->ndmns));
/* mark local data received */
if (log2ndmns) {
opal_bitmap_init (&coll->distance_mask_recv, log2ndmns);
}
/* get my own rank */
coll->my_rank = ORTE_VPID_INVALID;
for (orte_vpid_t nv = 0 ; nv < coll->ndmns ; ++nv) {
if (coll->dmns[nv] == ORTE_PROC_MY_NAME->vpid) {
coll->my_rank = nv;
break;
}
}
/* check for bozo case */
if (ORTE_VPID_INVALID == coll->my_rank) {
OPAL_OUTPUT((orte_grpcomm_base_framework.framework_output,
"My peer not found in daemons array"));
ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
rcd_finalize_coll(coll, ORTE_ERR_NOT_FOUND);
return ORTE_ERR_NOT_FOUND;
}
/* start by seeding the collection with our own data */
opal_dss.copy_payload(&coll->bucket, sendbuf);
coll->nreported = 1;
/* process data */
rcd_allgather_process_data (coll, 0);
return ORTE_SUCCESS;
}
static int rcd_allgather_send_dist(orte_grpcomm_coll_t *coll, orte_process_name_t *peer, uint32_t distance) {
opal_buffer_t *send_buf;
int rc;
send_buf = OBJ_NEW(opal_buffer_t);
/* pack the signature */
if (OPAL_SUCCESS != (rc = opal_dss.pack(send_buf, &coll->sig, 1, ORTE_SIGNATURE))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(send_buf);
return rc;
}
/* pack the distance */
if (OPAL_SUCCESS != (rc = opal_dss.pack(send_buf, &distance, 1, OPAL_UINT32))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(send_buf);
return rc;
}
/* pack the data */
if (OPAL_SUCCESS != (rc = opal_dss.copy_payload(send_buf, &coll->bucket))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(send_buf);
return rc;
}
OPAL_OUTPUT_VERBOSE((5, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:recdub SENDING TO %s",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
ORTE_NAME_PRINT(peer)));
if (0 > (rc = orte_rml.send_buffer_nb(orte_coll_conduit,
peer, send_buf,
ORTE_RML_TAG_ALLGATHER_RCD,
orte_rml_send_callback, NULL))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(send_buf);
return rc;
};
return ORTE_SUCCESS;
}
static void rcd_allgather_process_data(orte_grpcomm_coll_t *coll, uint32_t distance) {
/* Communication step:
At every step i, rank r:
- exchanges message containing all data collected so far with rank peer = (r ^ 2^i).
*/
uint32_t log2ndmns = log2(coll->ndmns);
orte_process_name_t peer;
orte_vpid_t nv;
int rc;
peer.jobid = ORTE_PROC_MY_NAME->jobid;
while (distance < log2ndmns) {
OPAL_OUTPUT_VERBOSE((80, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:recdub process distance %u",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), distance));
/* first send my current contents */
nv = coll->my_rank ^ (1 << distance);
assert (nv < coll->ndmns);
peer.vpid = coll->dmns[nv];
rcd_allgather_send_dist(coll, &peer, distance);
/* check whether data for next distance is available */
if (NULL == coll->buffers || NULL == coll->buffers[distance]) {
break;
}
OPAL_OUTPUT_VERBOSE((80, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:recdub %u distance data found",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), distance));
if (OPAL_SUCCESS != (rc = opal_dss.copy_payload(&coll->bucket, coll->buffers[distance]))) {
ORTE_ERROR_LOG(rc);
rcd_finalize_coll(coll, rc);
return;
}
coll->nreported += 1 << distance;
orte_grpcomm_base_mark_distance_recv(coll, distance);
OBJ_RELEASE(coll->buffers[distance]);
coll->buffers[distance] = NULL;
++distance;
}
OPAL_OUTPUT_VERBOSE((80, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:recdub reported %lu process from %lu",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), (unsigned long)coll->nreported,
(unsigned long)coll->ndmns));
/* if we are done, then complete things */
if (coll->nreported == coll->ndmns) {
rcd_finalize_coll(coll, ORTE_SUCCESS);
}
}
static void rcd_allgather_recv_dist(int status, orte_process_name_t* sender,
opal_buffer_t* buffer, orte_rml_tag_t tag,
void* cbdata)
{
int32_t cnt;
uint32_t distance;
int rc;
orte_grpcomm_signature_t *sig;
orte_grpcomm_coll_t *coll;
OPAL_OUTPUT_VERBOSE((5, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:recdub RECEIVING FROM %s",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
ORTE_NAME_PRINT(sender)));
/* unpack the signature */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer, &sig, &cnt, ORTE_SIGNATURE))) {
ORTE_ERROR_LOG(rc);
return;
}
/* check for the tracker and create it if not found */
if (NULL == (coll = orte_grpcomm_base_get_tracker(sig, true))) {
ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
OBJ_RELEASE(sig);
return;
}
/* unpack the distance */
distance = -1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer, &distance, &cnt, OPAL_UINT32))) {
OBJ_RELEASE(sig);
ORTE_ERROR_LOG(rc);
rcd_finalize_coll(coll, rc);
return;
}
assert(distance >= 0 && 0 == orte_grpcomm_base_check_distance_recv(coll, distance));
/* Check whether we can process next distance */
if (coll->nreported && (!distance || orte_grpcomm_base_check_distance_recv(coll, (distance - 1)))) {
OPAL_OUTPUT_VERBOSE((80, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:recdub data from %d distance received, "
"Process the next distance.",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), distance));
/* capture any provided content */
if (OPAL_SUCCESS != (rc = opal_dss.copy_payload(&coll->bucket, buffer))) {
OBJ_RELEASE(sig);
ORTE_ERROR_LOG(rc);
rcd_finalize_coll(coll, rc);
return;
}
coll->nreported += (1 << distance);
orte_grpcomm_base_mark_distance_recv (coll, distance);
rcd_allgather_process_data (coll, distance + 1);
} else {
OPAL_OUTPUT_VERBOSE((80, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:recdub data from %d distance received, "
"still waiting for data.",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), distance));
if (NULL == coll->buffers) {
coll->buffers = (opal_buffer_t **) calloc (log2 (coll->ndmns), sizeof (coll->buffers[0]));
if (NULL == coll->buffers) {
OBJ_RELEASE(sig);
ORTE_ERROR_LOG(OPAL_ERR_OUT_OF_RESOURCE);
rcd_finalize_coll(coll, OPAL_ERR_OUT_OF_RESOURCE);
return;
}
}
if (NULL == (coll->buffers[distance] = OBJ_NEW(opal_buffer_t))) {
OBJ_RELEASE(sig);
ORTE_ERROR_LOG(OPAL_ERR_OUT_OF_RESOURCE);
rcd_finalize_coll(coll, OPAL_ERR_OUT_OF_RESOURCE);
return;
}
if (OPAL_SUCCESS != (rc = opal_dss.copy_payload(coll->buffers[distance], buffer))) {
OBJ_RELEASE(sig);
ORTE_ERROR_LOG(rc);
rcd_finalize_coll(coll, rc);
return;
}
}
OBJ_RELEASE(sig);
}
static int rcd_finalize_coll(orte_grpcomm_coll_t *coll, int ret)
{
OPAL_OUTPUT_VERBOSE((5, orte_grpcomm_base_framework.framework_output,
"%s grpcomm:coll:recdub declared collective complete",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
/* execute the callback */
if (NULL != coll->cbfunc) {
coll->cbfunc(ret, &coll->bucket, coll->cbdata);
}
opal_list_remove_item(&orte_grpcomm_base.ongoing, &coll->super);
OBJ_RELEASE(coll);
return ORTE_SUCCESS;
}

Просмотреть файл

@ -1,31 +0,0 @@
/* -*- C -*-
*
* Copyright (c) 2011 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2014 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*
*/
#ifndef GRPCOMM_RCD_H
#define GRPCOMM_RCD_H
#include "orte_config.h"
#include "orte/mca/grpcomm/grpcomm.h"
BEGIN_C_DECLS
/*
* Grpcomm interfaces
*/
ORTE_MODULE_DECLSPEC extern orte_grpcomm_base_component_t mca_grpcomm_rcd_component;
extern orte_grpcomm_base_module_t orte_grpcomm_rcd_module;
END_C_DECLS
#endif

Просмотреть файл

@ -1,84 +0,0 @@
/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */
/*
* Copyright (c) 2011 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2011-2015 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2014 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#include "orte_config.h"
#include "orte/constants.h"
#include "orte/mca/mca.h"
#include "opal/runtime/opal_params.h"
#include "orte/util/proc_info.h"
#include "grpcomm_rcd.h"
static int my_priority=5;
static int rcd_open(void);
static int rcd_close(void);
static int rcd_query(mca_base_module_t **module, int *priority);
static int rcd_register(void);
/*
* Struct of function pointers that need to be initialized
*/
orte_grpcomm_base_component_t mca_grpcomm_rcd_component = {
.base_version = {
ORTE_GRPCOMM_BASE_VERSION_3_0_0,
.mca_component_name = "rcd",
MCA_BASE_MAKE_VERSION(component, ORTE_MAJOR_VERSION, ORTE_MINOR_VERSION,
ORTE_RELEASE_VERSION),
.mca_open_component = rcd_open,
.mca_close_component = rcd_close,
.mca_query_component = rcd_query,
.mca_register_component_params = rcd_register,
},
.base_data = {
/* The component is checkpoint ready */
MCA_BASE_METADATA_PARAM_CHECKPOINT
},
};
static int rcd_register(void)
{
mca_base_component_t *c = &mca_grpcomm_rcd_component.base_version;
/* make the priority adjustable so users can select
* rcd for use by apps without affecting daemons
*/
my_priority = 80;
(void) mca_base_component_var_register(c, "priority",
"Priority of the grpcomm rcd component",
MCA_BASE_VAR_TYPE_INT, NULL, 0, 0,
OPAL_INFO_LVL_9,
MCA_BASE_VAR_SCOPE_READONLY,
&my_priority);
return ORTE_SUCCESS;
}
/* Open the component */
static int rcd_open(void)
{
return ORTE_SUCCESS;
}
static int rcd_close(void)
{
return ORTE_SUCCESS;
}
static int rcd_query(mca_base_module_t **module, int *priority)
{
*priority = my_priority;
*module = (mca_base_module_t *)&orte_grpcomm_rcd_module;
return ORTE_SUCCESS;
}

Просмотреть файл

@ -1,7 +0,0 @@
#
# owner/status file
# owner: institution that is responsible for this package
# status: e.g. active, maintenance, unmaintained
#
owner: INTEL
status: maintenance

Просмотреть файл

@ -14,7 +14,7 @@
* Copyright (c) 2011-2013 Los Alamos National Security, LLC.
* All rights reserved.
* Copyright (c) 2011-2018 Cisco Systems, Inc. All rights reserved
* Copyright (c) 2013-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2013-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2014-2018 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* Copyright (c) 2017 Mellanox Technologies Ltd. All rights reserved.
@ -67,7 +67,6 @@
#include "orte/mca/ess/base/base.h"
#include "orte/mca/grpcomm/base/base.h"
#include "orte/mca/plm/base/base.h"
#include "orte/mca/regx/regx.h"
#include "orte/mca/rml/base/rml_contact.h"
#include "orte/mca/rmaps/rmaps_types.h"
#include "orte/mca/rmaps/base/base.h"
@ -79,6 +78,7 @@
#include "orte/util/context_fns.h"
#include "orte/util/name_fns.h"
#include "orte/util/nidmap.h"
#include "orte/util/session_dir.h"
#include "orte/util/proc_info.h"
#include "orte/util/show_help.h"
@ -148,7 +148,6 @@ int orte_odls_base_default_get_add_procs_data(opal_buffer_t *buffer,
int8_t flag;
void *nptr;
uint32_t key;
char *nidmap;
orte_proc_t *dmn, *proc;
opal_value_t *val = NULL, *kv;
opal_list_t *modex, ilist;
@ -167,33 +166,21 @@ int orte_odls_base_default_get_add_procs_data(opal_buffer_t *buffer,
return ORTE_SUCCESS;
}
/* if we couldn't provide the allocation regex on the orted
* cmd line, then we need to provide all the info here */
if (!orte_nidmap_communicated) {
if (ORTE_SUCCESS != (rc = orte_regx.nidmap_create(orte_node_pool, &nidmap))) {
ORTE_ERROR_LOG(rc);
return rc;
}
orte_nidmap_communicated = true;
} else {
nidmap = NULL;
}
opal_dss.pack(buffer, &nidmap, 1, OPAL_STRING);
if (NULL != nidmap) {
free(nidmap);
}
/* if we haven't already done so, provide the info on the
* capabilities of each node */
/* provide the nidmap - i.e., the map of hostnames
* and the vpid of the daemon running on each node.
* In a DVM, we should only have to do this once */
if (1 < orte_process_info.num_procs &&
(!orte_node_info_communicated ||
orte_get_attribute(&jdata->attributes, ORTE_JOB_LAUNCHED_DAEMONS, NULL, OPAL_BOOL))) {
/* mark that we did include this info */
flag = 1;
opal_dss.pack(buffer, &flag, 1, OPAL_INT8);
if (ORTE_SUCCESS != (rc = orte_regx.encode_nodemap(buffer))) {
/* load the nidmap */
if (ORTE_SUCCESS != (rc = orte_util_nidmap_create(orte_node_pool, buffer))) {
ORTE_ERROR_LOG(rc);
return rc;
}
/* get wireup info for daemons */
if (NULL == (jptr = orte_get_job_data_object(ORTE_PROC_MY_NAME->jobid))) {
ORTE_ERROR_LOG(ORTE_ERR_BAD_PARAM);
@ -227,104 +214,100 @@ int orte_odls_base_default_get_add_procs_data(opal_buffer_t *buffer,
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(wireup);
return rc;
} else {
/* the data is returned as a list of key-value pairs in the opal_value_t */
if (OPAL_PTR != val->type) {
ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
OBJ_RELEASE(wireup);
return ORTE_ERR_NOT_FOUND;
}
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, ORTE_PROC_MY_NAME, 1, ORTE_NAME))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(wireup);
return rc;
}
modex = (opal_list_t*)val->data.ptr;
numbytes = (int32_t)opal_list_get_size(modex);
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, &numbytes, 1, OPAL_INT32))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(wireup);
return rc;
}
OPAL_LIST_FOREACH(kv, modex, opal_value_t) {
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, &kv, 1, OPAL_VALUE))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(wireup);
return rc;
}
}
OPAL_LIST_RELEASE(modex);
OBJ_RELEASE(val);
}
}
/* if we didn't rollup the connection info, then we have
* to provide a complete map of connection info */
if (!orte_static_ports && !orte_fwd_mpirun_port) {
for (v=1; v < jptr->procs->size; v++) {
if (NULL == (dmn = (orte_proc_t*)opal_pointer_array_get_item(jptr->procs, v))) {
continue;
/* the data is returned as a list of key-value pairs in the opal_value_t */
if (OPAL_PTR != val->type) {
ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
OBJ_RELEASE(wireup);
return ORTE_ERR_NOT_FOUND;
}
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, ORTE_PROC_MY_NAME, 1, ORTE_NAME))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(wireup);
return rc;
}
modex = (opal_list_t*)val->data.ptr;
numbytes = (int32_t)opal_list_get_size(modex);
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, &numbytes, 1, OPAL_INT32))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(wireup);
return rc;
}
OPAL_LIST_FOREACH(kv, modex, opal_value_t) {
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, &kv, 1, OPAL_VALUE))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(wireup);
return rc;
}
val = NULL;
if (opal_pmix.legacy_get()) {
if (OPAL_SUCCESS != (rc = opal_pmix.get(&dmn->name, OPAL_PMIX_PROC_URI, NULL, &val)) || NULL == val) {
}
OPAL_LIST_RELEASE(modex);
OBJ_RELEASE(val);
}
/* provide a complete map of connection info */
for (v=1; v < jptr->procs->size; v++) {
if (NULL == (dmn = (orte_proc_t*)opal_pointer_array_get_item(jptr->procs, v))) {
continue;
}
val = NULL;
if (opal_pmix.legacy_get()) {
if (OPAL_SUCCESS != (rc = opal_pmix.get(&dmn->name, OPAL_PMIX_PROC_URI, NULL, &val)) || NULL == val) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(buffer);
OBJ_RELEASE(wireup);
return rc;
} else {
/* pack the name of the daemon */
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, &dmn->name, 1, ORTE_NAME))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(buffer);
OBJ_RELEASE(wireup);
return rc;
} else {
/* pack the name of the daemon */
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, &dmn->name, 1, ORTE_NAME))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(buffer);
OBJ_RELEASE(wireup);
return rc;
}
/* pack the URI */
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, &val->data.string, 1, OPAL_STRING))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(buffer);
OBJ_RELEASE(wireup);
return rc;
}
OBJ_RELEASE(val);
}
} else {
if (OPAL_SUCCESS != (rc = opal_pmix.get(&dmn->name, NULL, NULL, &val)) || NULL == val) {
/* pack the URI */
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, &val->data.string, 1, OPAL_STRING))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(buffer);
OBJ_RELEASE(wireup);
return rc;
} else {
/* the data is returned as a list of key-value pairs in the opal_value_t */
if (OPAL_PTR != val->type) {
ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
OBJ_RELEASE(buffer);
return ORTE_ERR_NOT_FOUND;
}
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, &dmn->name, 1, ORTE_NAME))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(buffer);
OBJ_RELEASE(wireup);
return rc;
}
modex = (opal_list_t*)val->data.ptr;
numbytes = (int32_t)opal_list_get_size(modex);
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, &numbytes, 1, OPAL_INT32))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(buffer);
OBJ_RELEASE(wireup);
return rc;
}
OPAL_LIST_FOREACH(kv, modex, opal_value_t) {
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, &kv, 1, OPAL_VALUE))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(buffer);
OBJ_RELEASE(wireup);
return rc;
}
}
OPAL_LIST_RELEASE(modex);
OBJ_RELEASE(val);
}
OBJ_RELEASE(val);
}
} else {
if (OPAL_SUCCESS != (rc = opal_pmix.get(&dmn->name, NULL, NULL, &val)) || NULL == val) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(buffer);
return rc;
} else {
/* the data is returned as a list of key-value pairs in the opal_value_t */
if (OPAL_PTR != val->type) {
ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
OBJ_RELEASE(buffer);
return ORTE_ERR_NOT_FOUND;
}
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, &dmn->name, 1, ORTE_NAME))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(buffer);
OBJ_RELEASE(wireup);
return rc;
}
modex = (opal_list_t*)val->data.ptr;
numbytes = (int32_t)opal_list_get_size(modex);
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, &numbytes, 1, OPAL_INT32))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(buffer);
OBJ_RELEASE(wireup);
return rc;
}
OPAL_LIST_FOREACH(kv, modex, opal_value_t) {
if (ORTE_SUCCESS != (rc = opal_dss.pack(wireup, &kv, 1, OPAL_VALUE))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(buffer);
OBJ_RELEASE(wireup);
return rc;
}
}
OPAL_LIST_RELEASE(modex);
OBJ_RELEASE(val);
}
}
}
@ -417,17 +400,11 @@ int orte_odls_base_default_get_add_procs_data(opal_buffer_t *buffer,
}
if (!orte_get_attribute(&jdata->attributes, ORTE_JOB_FULLY_DESCRIBED, NULL, OPAL_BOOL)) {
/* compute and pack the ppn regex */
if (ORTE_SUCCESS != (rc = orte_regx.generate_ppn(jdata, &nidmap))) {
/* compute and pack the ppn */
if (ORTE_SUCCESS != (rc = orte_util_generate_ppn(jdata, buffer))) {
ORTE_ERROR_LOG(rc);
return rc;
}
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &nidmap, 1, OPAL_STRING))) {
ORTE_ERROR_LOG(rc);
free(nidmap);
return rc;
}
free(nidmap);
}
/* get any application prep info */
@ -485,7 +462,6 @@ int orte_odls_base_default_construct_child_list(opal_buffer_t *buffer,
orte_proc_t *pptr, *dmn;
orte_app_context_t *app;
int8_t flag;
char *ppn;
opal_value_t *kv;
opal_list_t local_support, cache;
opal_pmix_lock_t lock;
@ -623,29 +599,21 @@ int orte_odls_base_default_construct_child_list(opal_buffer_t *buffer,
* and sent us the complete array of procs in the orte_job_t, so we
* don't need to do anything more here */
if (!orte_get_attribute(&jdata->attributes, ORTE_JOB_FULLY_DESCRIBED, NULL, OPAL_BOOL)) {
/* extract the ppn regex */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer, &ppn, &cnt, OPAL_STRING))) {
/* load the ppn info into the job and node arrays - the
* function will ignore the data on the HNP as it already
* has the info */
if (ORTE_SUCCESS != (rc = orte_util_decode_ppn(jdata, buffer))) {
ORTE_ERROR_LOG(rc);
goto REPORT_ERROR;
}
if (!ORTE_PROC_IS_HNP) {
/* populate the node array of the job map and the proc array of
* the job object so we know how many procs are on each node */
if (ORTE_SUCCESS != (rc = orte_regx.parse_ppn(jdata, ppn))) {
ORTE_ERROR_LOG(rc);
free(ppn);
goto REPORT_ERROR;
}
/* now assign locations to the procs */
/* assign locations to the procs */
if (ORTE_SUCCESS != (rc = orte_rmaps_base_assign_locations(jdata))) {
ORTE_ERROR_LOG(rc);
free(ppn);
goto REPORT_ERROR;
}
}
free(ppn);
/* compute the ranks and add the proc objects
* to the jdata->procs array */

Просмотреть файл

@ -14,7 +14,7 @@
* reserved.
* Copyright (c) 2009-2015 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2011 Oak Ridge National Labs. All rights reserved.
* Copyright (c) 2013-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2013-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2014 NVIDIA Corporation. All rights reserved.
* Copyright (c) 2015-2017 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
@ -334,11 +334,6 @@ static int tcp_component_register(void)
if (NULL != mca_oob_tcp_component.tcp_static_ports ||
NULL != mca_oob_tcp_component.tcp6_static_ports) {
/* can't fwd mpirun port _and_ have static ports */
if (ORTE_PROC_IS_HNP && orte_fwd_mpirun_port) {
orte_show_help("help-oob-tcp.txt", "static-fwd", true);
return ORTE_ERR_NOT_AVAILABLE;
}
orte_static_ports = true;
}

Просмотреть файл

@ -13,7 +13,7 @@
* Copyright (c) 2009 Institut National de Recherche en Informatique
* et Automatique. All rights reserved.
* Copyright (c) 2011-2012 Los Alamos National Security, LLC.
* Copyright (c) 2013-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2013-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2014-2018 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* Copyright (c) 2016 IBM Corporation. All rights reserved.
@ -44,8 +44,10 @@
#include "opal/dss/dss.h"
#include "opal/mca/hwloc/hwloc-internal.h"
#include "opal/mca/pmix/pmix.h"
#include "opal/mca/compress/compress.h"
#include "orte/util/dash_host/dash_host.h"
#include "orte/util/nidmap.h"
#include "orte/util/session_dir.h"
#include "orte/util/show_help.h"
#include "orte/mca/errmgr/errmgr.h"
@ -53,7 +55,6 @@
#include "orte/mca/iof/base/base.h"
#include "orte/mca/odls/base/base.h"
#include "orte/mca/ras/base/base.h"
#include "orte/mca/regx/regx.h"
#include "orte/mca/rmaps/rmaps.h"
#include "orte/mca/rmaps/base/base.h"
#include "orte/mca/rml/rml.h"
@ -72,7 +73,6 @@
#include "orte/runtime/runtime.h"
#include "orte/runtime/orte_locks.h"
#include "orte/runtime/orte_quit.h"
#include "orte/util/compress.h"
#include "orte/util/name_fns.h"
#include "orte/util/pre_condition_transports.h"
#include "orte/util/proc_info.h"
@ -580,7 +580,7 @@ void orte_plm_base_send_launch_msg(int fd, short args, void *cbdata)
uint8_t *cmpdata;
size_t cmplen;
/* report the size of the launch message */
compressed = orte_util_compress_block((uint8_t*)jdata->launch_msg.base_ptr,
compressed = opal_compress.compress_block((uint8_t*)jdata->launch_msg.base_ptr,
jdata->launch_msg.bytes_used,
&cmpdata, &cmplen);
if (compressed) {
@ -857,7 +857,7 @@ void orte_plm_base_daemon_topology(int status, orte_process_name_t* sender,
goto CLEANUP;
}
/* decompress the data */
if (orte_util_uncompress_block(&cmpdata, cmplen,
if (opal_compress.decompress_block(&cmpdata, cmplen,
packed_data, inlen)) {
/* the data has been uncompressed */
opal_dss.load(&datbuf, cmpdata, cmplen);
@ -1184,7 +1184,7 @@ void orte_plm_base_daemon_callback(int status, orte_process_name_t* sender,
goto CLEANUP;
}
/* decompress the data */
if (orte_util_uncompress_block(&cmpdata, cmplen,
if (opal_compress.decompress_block(&cmpdata, cmplen,
packed_data, inlen)) {
/* the data has been uncompressed */
opal_dss.load(&datbuf, cmpdata, cmplen);
@ -1515,46 +1515,6 @@ int orte_plm_base_orted_append_basic_args(int *argc, char ***argv,
opal_argv_append(argc, argv, param);
free(param);
/* convert the nodes with daemons to a regex */
param = NULL;
if (ORTE_SUCCESS != (rc = orte_regx.nidmap_create(orte_node_pool, &param))) {
ORTE_ERROR_LOG(rc);
return rc;
}
if (NULL != orte_node_regex) {
free(orte_node_regex);
}
orte_node_regex = param;
/* if this is too long, then we'll have to do it with
* a phone home operation instead */
if (strlen(param) < orte_plm_globals.node_regex_threshold) {
opal_argv_append(argc, argv, "-"OPAL_MCA_CMD_LINE_ID);
opal_argv_append(argc, argv, "orte_node_regex");
opal_argv_append(argc, argv, orte_node_regex);
/* mark that the nidmap has been communicated */
orte_nidmap_communicated = true;
}
if (!orte_static_ports && !orte_fwd_mpirun_port) {
/* if we are using static ports, or we are forwarding
* mpirun's port, then we would have built all the
* connection info and so there is nothing to be passed.
* Otherwise, we have to pass the HNP uri so we can
* phone home */
opal_argv_append(argc, argv, "-"OPAL_MCA_CMD_LINE_ID);
opal_argv_append(argc, argv, "orte_hnp_uri");
opal_argv_append(argc, argv, orte_process_info.my_hnp_uri);
}
/* if requested, pass our port */
if (orte_fwd_mpirun_port) {
opal_asprintf(&param, "%d", orte_process_info.my_port);
opal_argv_append(argc, argv, "-"OPAL_MCA_CMD_LINE_ID);
opal_argv_append(argc, argv, "oob_tcp_static_ipv4_ports");
opal_argv_append(argc, argv, param);
free(param);
}
/* if --xterm was specified, pass that along */
if (NULL != orte_xterm) {
opal_argv_append(argc, argv, "-"OPAL_MCA_CMD_LINE_ID);
@ -2136,7 +2096,7 @@ int orte_plm_base_setup_virtual_machine(orte_job_t *jdata)
opal_list_remove_item(&nodes, item);
OBJ_RELEASE(item);
} else {
/* The filtering logic sets this flag only for nodes which
/* The filtering logic sets this flag only for nodes which
* are kept after filtering. This flag will be subsequently
* used in rmaps components and must be reset here */
ORTE_FLAG_UNSET(node, ORTE_NODE_FLAG_MAPPED);

Просмотреть файл

@ -1,30 +0,0 @@
#
# Copyright (c) 2015-2018 Intel, Inc. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
# main library setup
noinst_LTLIBRARIES = libmca_regx.la
libmca_regx_la_SOURCES =
# pkgdata setup
dist_ortedata_DATA =
# local files
headers = regx.h
libmca_regx_la_SOURCES += $(headers)
# Conditionally install the header files
if WANT_INSTALL_HEADERS
ortedir = $(orteincludedir)/$(subdir)
nobase_orte_HEADERS = $(headers)
endif
include base/Makefile.am
distclean-local:
rm -f base/static-components.h

Просмотреть файл

@ -1,18 +0,0 @@
#
# Copyright (c) 2015-2018 Intel, Inc. All rights reserved.
# Copyright (c) 2018 Research Organization for Information Science
# and Technology (RIST). All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
headers += \
base/base.h
libmca_regx_la_SOURCES += \
base/regx_base_default_fns.c \
base/regx_base_frame.c \
base/regx_base_select.c

Просмотреть файл

@ -1,74 +0,0 @@
/*
* Copyright (c) 2015-2018 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
/** @file:
* regx framework base functionality.
*/
#ifndef ORTE_MCA_REGX_BASE_H
#define ORTE_MCA_REGX_BASE_H
/*
* includes
*/
#include "orte_config.h"
#include "orte/types.h"
#include "opal/class/opal_list.h"
#include "orte/mca/mca.h"
#include "orte/runtime/orte_globals.h"
#include "orte/mca/regx/regx.h"
BEGIN_C_DECLS
/*
* MCA Framework
*/
ORTE_DECLSPEC extern mca_base_framework_t orte_regx_base_framework;
/* select all components */
ORTE_DECLSPEC int orte_regx_base_select(void);
/*
* common stuff
*/
typedef struct {
opal_list_item_t super;
int vpid;
int cnt;
int slots;
orte_topology_t *t;
} orte_regex_range_t;
OBJ_CLASS_DECLARATION(orte_regex_range_t);
typedef struct {
/* list object */
opal_list_item_t super;
char *prefix;
char *suffix;
int num_digits;
opal_list_t ranges;
} orte_regex_node_t;
END_C_DECLS
OBJ_CLASS_DECLARATION(orte_regex_node_t);
ORTE_DECLSPEC extern int orte_regx_base_nidmap_parse(char *regex);
ORTE_DECLSPEC extern int orte_regx_base_encode_nodemap(opal_buffer_t *buffer);
ORTE_DECLSPEC int orte_regx_base_decode_daemon_nodemap(opal_buffer_t *buffer);
ORTE_DECLSPEC int orte_regx_base_generate_ppn(orte_job_t *jdata, char **ppn);
ORTE_DECLSPEC int orte_regx_base_parse_ppn(orte_job_t *jdata, char *regex);
ORTE_DECLSPEC int orte_regx_base_extract_node_names(char *regexp, char ***names);
#endif

Просмотреть файл

@ -1,7 +0,0 @@
#
# owner/status file
# owner: institution that is responsible for this package
# status: e.g. active, maintenance, unmaintained
#
owner: INTEL
status: active

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Просмотреть файл

@ -1,77 +0,0 @@
/*
* Copyright (c) 2015-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2015 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#include "orte_config.h"
#include "orte/constants.h"
#include <string.h>
#include "orte/mca/mca.h"
#include "opal/util/argv.h"
#include "opal/util/output.h"
#include "opal/mca/base/base.h"
#include "orte/runtime/orte_globals.h"
#include "orte/util/show_help.h"
#include "orte/mca/errmgr/errmgr.h"
#include "orte/mca/regx/base/base.h"
/*
* The following file was created by configure. It contains extern
* statements and the definition of an array of pointers to each
* component's public mca_base_component_t struct.
*/
#include "orte/mca/regx/base/static-components.h"
/*
* Global variables
*/
orte_regx_base_module_t orte_regx = {0};
static int orte_regx_base_close(void)
{
/* give the selected module a chance to finalize */
if (NULL != orte_regx.finalize) {
orte_regx.finalize();
}
return mca_base_framework_components_close(&orte_regx_base_framework, NULL);
}
/**
* Function for finding and opening either all MCA components, or the one
* that was specifically requested via a MCA parameter.
*/
static int orte_regx_base_open(mca_base_open_flag_t flags)
{
int rc;
/* Open up all available components */
rc = mca_base_framework_components_open(&orte_regx_base_framework, flags);
/* All done */
return rc;
}
MCA_BASE_FRAMEWORK_DECLARE(orte, regx, "ORTE Regx Subsystem", NULL,
orte_regx_base_open, orte_regx_base_close,
mca_regx_base_static_components, 0);
/* OBJECT INSTANTIATIONS */
static void nrcon(orte_nidmap_regex_t *p)
{
p->ctx = 0;
p->nprocs = -1;
p->cnt = 0;
}
OBJ_CLASS_INSTANCE(orte_nidmap_regex_t,
opal_list_item_t,
nrcon, NULL);

Просмотреть файл

@ -1,61 +0,0 @@
/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */
/*
* Copyright (c) 2004-2008 The Trustees of Indiana University and Indiana
* University Research and Technology
* Corporation. All rights reserved.
* Copyright (c) 2004-2005 The University of Tennessee and The University
* of Tennessee Research Foundation. All rights
* reserved.
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
* University of Stuttgart. All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
* Copyright (c) 2015 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2018 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#include "orte_config.h"
#include "orte/constants.h"
#include "orte/mca/mca.h"
#include "opal/mca/base/base.h"
#include "orte/mca/regx/base/base.h"
/**
* Function for selecting one component from all those that are
* available.
*/
int orte_regx_base_select(void)
{
orte_regx_base_component_t *best_component = NULL;
orte_regx_base_module_t *best_module = NULL;
int rc = ORTE_SUCCESS;
/*
* Select the best component
*/
if (OPAL_SUCCESS != mca_base_select("regx", orte_regx_base_framework.framework_output,
&orte_regx_base_framework.framework_components,
(mca_base_module_t **) &best_module,
(mca_base_component_t **) &best_component, NULL)) {
/* This will only happen if no component was selected */
return ORTE_ERR_NOT_FOUND;
}
/* Save the winner */
orte_regx = *best_module;
/* give it a chance to init */
if (NULL != orte_regx.init) {
rc = orte_regx.init();
}
return rc;
}

Просмотреть файл

@ -1,36 +0,0 @@
#
# Copyright (c) 2016-2018 Intel, Inc. All rights reserved.
# Copyright (c) 2017 IBM Corporation. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
sources = \
regx_fwd_component.c \
regx_fwd.h \
regx_fwd.c
# Make the output library in this directory, and name it either
# mca_<type>_<name>.la (for DSO builds) or libmca_<type>_<name>.la
# (for static builds).
if MCA_BUILD_orte_regx_fwd_DSO
component_noinst =
component_install = mca_regx_fwd.la
else
component_noinst = libmca_regx_fwd.la
component_install =
endif
mcacomponentdir = $(ortelibdir)
mcacomponent_LTLIBRARIES = $(component_install)
mca_regx_fwd_la_SOURCES = $(sources)
mca_regx_fwd_la_LDFLAGS = -module -avoid-version
mca_regx_fwd_la_LIBADD = $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
noinst_LTLIBRARIES = $(component_noinst)
libmca_regx_fwd_la_SOURCES = $(sources)
libmca_regx_fwd_la_LDFLAGS = -module -avoid-version

Просмотреть файл

@ -1,7 +0,0 @@
#
# owner/status file
# owner: institution that is responsible for this package
# status: e.g. active, maintenance, unmaintained
#
owner: INTEL
status: active

Просмотреть файл

@ -1,300 +0,0 @@
/*
* Copyright (c) 2016-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2018 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*
*/
#include "orte_config.h"
#include "orte/types.h"
#include "opal/types.h"
#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif
#include <ctype.h>
#include "opal/util/argv.h"
#include "opal/util/basename.h"
#include "opal/util/opal_environ.h"
#include "orte/runtime/orte_globals.h"
#include "orte/util/name_fns.h"
#include "orte/util/show_help.h"
#include "orte/mca/errmgr/errmgr.h"
#include "orte/mca/rmaps/base/base.h"
#include "orte/mca/routed/routed.h"
#include "orte/mca/regx/base/base.h"
#include "regx_fwd.h"
static int nidmap_create(opal_pointer_array_t *pool, char **regex);
orte_regx_base_module_t orte_regx_fwd_module = {
.nidmap_create = nidmap_create,
.nidmap_parse = orte_regx_base_nidmap_parse,
.extract_node_names = orte_regx_base_extract_node_names,
.encode_nodemap = orte_regx_base_encode_nodemap,
.decode_daemon_nodemap = orte_regx_base_decode_daemon_nodemap,
.generate_ppn = orte_regx_base_generate_ppn,
.parse_ppn = orte_regx_base_parse_ppn
};
static int nidmap_create(opal_pointer_array_t *pool, char **regex)
{
char *node;
char prefix[ORTE_MAX_NODE_PREFIX];
int i, j, n, len, startnum, nodenum, numdigits;
bool found;
char *suffix, *sfx, *nodenames;
orte_regex_node_t *ndreg;
orte_regex_range_t *range, *rng;
opal_list_t nodenms, dvpids;
opal_list_item_t *item, *itm2;
char **regexargs = NULL, *tmp, *tmp2;
orte_node_t *nptr;
orte_vpid_t vpid;
OBJ_CONSTRUCT(&nodenms, opal_list_t);
OBJ_CONSTRUCT(&dvpids, opal_list_t);
rng = NULL;
for (n=0; n < pool->size; n++) {
if (NULL == (nptr = (orte_node_t*)opal_pointer_array_get_item(pool, n))) {
continue;
}
/* if no daemon has been assigned, then this node is not being used */
if (NULL == nptr->daemon) {
vpid = -1; // indicates no daemon assigned
} else {
vpid = nptr->daemon->name.vpid;
}
/* deal with the daemon vpid - see if it is next in the
* current range */
if (NULL == rng) {
/* just starting */
rng = OBJ_NEW(orte_regex_range_t);
rng->vpid = vpid;
rng->cnt = 1;
opal_list_append(&dvpids, &rng->super);
} else if (UINT32_MAX == vpid) {
if (-1 == rng->vpid) {
rng->cnt++;
} else {
/* need to start another range */
rng = OBJ_NEW(orte_regex_range_t);
rng->vpid = vpid;
rng->cnt = 1;
opal_list_append(&dvpids, &rng->super);
}
} else if (-1 == rng->vpid) {
/* need to start another range */
rng = OBJ_NEW(orte_regex_range_t);
rng->vpid = vpid;
rng->cnt = 1;
opal_list_append(&dvpids, &rng->super);
} else {
/* is this the next in line */
if (vpid == (orte_vpid_t)(rng->vpid + rng->cnt)) {
rng->cnt++;
} else {
/* need to start another range */
rng = OBJ_NEW(orte_regex_range_t);
rng->vpid = vpid;
rng->cnt = 1;
opal_list_append(&dvpids, &rng->super);
}
}
node = nptr->name;
/* determine this node's prefix by looking for first digit char */
len = strlen(node);
startnum = -1;
memset(prefix, 0, ORTE_MAX_NODE_PREFIX);
for (i=0, j=0; i < len; i++) {
/* valid hostname characters are ascii letters, digits and the '-' character. */
if (isdigit(node[i])) {
/* count the size of the numeric field - but don't
* add the digits to the prefix
*/
if (startnum < 0) {
/* okay, this defines end of the prefix */
startnum = i;
}
continue;
}
/* this must be either an alpha, a '.', or '-' */
if (!isalpha(node[i]) && '-' != node[i] && '.' != node[i]) {
orte_show_help("help-regex.txt", "regex:invalid-name", true, node);
return ORTE_ERR_SILENT;
}
if (startnum < 0) {
prefix[j++] = node[i];
}
}
if (startnum < 0) {
/* can't compress this name - just add it to the list */
ndreg = OBJ_NEW(orte_regex_node_t);
ndreg->prefix = strdup(node);
opal_list_append(&nodenms, &ndreg->super);
continue;
}
/* convert the digits and get any suffix */
nodenum = strtol(&node[startnum], &sfx, 10);
if (NULL != sfx) {
suffix = strdup(sfx);
numdigits = (int)(sfx - &node[startnum]);
} else {
suffix = NULL;
numdigits = (int)strlen(&node[startnum]);
}
/* is this node name already on our list? */
found = false;
if (0 != opal_list_get_size(&nodenms)) {
ndreg = (orte_regex_node_t*)opal_list_get_last(&nodenms);
if ((0 < strlen(prefix) && NULL == ndreg->prefix) ||
(0 == strlen(prefix) && NULL != ndreg->prefix) ||
(0 < strlen(prefix) && NULL != ndreg->prefix &&
0 != strcmp(prefix, ndreg->prefix)) ||
(NULL == suffix && NULL != ndreg->suffix) ||
(NULL != suffix && NULL == ndreg->suffix) ||
(NULL != suffix && NULL != ndreg->suffix &&
0 != strcmp(suffix, ndreg->suffix)) ||
(numdigits != ndreg->num_digits)) {
found = false;
} else {
/* found a match - flag it */
found = true;
}
}
if (found) {
range = (orte_regex_range_t*)opal_list_get_last(&ndreg->ranges);
if (NULL == range) {
/* first range for this nodeid */
range = OBJ_NEW(orte_regex_range_t);
range->vpid = nodenum;
range->cnt = 1;
opal_list_append(&ndreg->ranges, &range->super);
/* see if the node number is out of sequence */
} else if (nodenum != (range->vpid + range->cnt)) {
/* start a new range */
range = OBJ_NEW(orte_regex_range_t);
range->vpid = nodenum;
range->cnt = 1;
opal_list_append(&ndreg->ranges, &range->super);
} else {
/* everything matches - just increment the cnt */
range->cnt++;
}
} else {
/* need to add it */
ndreg = OBJ_NEW(orte_regex_node_t);
if (0 < strlen(prefix)) {
ndreg->prefix = strdup(prefix);
}
if (NULL != suffix) {
ndreg->suffix = strdup(suffix);
}
ndreg->num_digits = numdigits;
opal_list_append(&nodenms, &ndreg->super);
/* record the first range for this nodeid - we took
* care of names we can't compress above
*/
range = OBJ_NEW(orte_regex_range_t);
range->vpid = nodenum;
range->cnt = 1;
opal_list_append(&ndreg->ranges, &range->super);
}
if (NULL != suffix) {
free(suffix);
}
}
/* begin constructing the regular expression */
while (NULL != (item = opal_list_remove_first(&nodenms))) {
ndreg = (orte_regex_node_t*)item;
/* if no ranges, then just add the name */
if (0 == opal_list_get_size(&ndreg->ranges)) {
if (NULL != ndreg->prefix) {
/* solitary node */
opal_asprintf(&tmp, "%s", ndreg->prefix);
opal_argv_append_nosize(&regexargs, tmp);
free(tmp);
}
OBJ_RELEASE(ndreg);
continue;
}
/* start the regex for this nodeid with the prefix */
if (NULL != ndreg->prefix) {
opal_asprintf(&tmp, "%s[%d:", ndreg->prefix, ndreg->num_digits);
} else {
opal_asprintf(&tmp, "[%d:", ndreg->num_digits);
}
/* add the ranges */
while (NULL != (itm2 = opal_list_remove_first(&ndreg->ranges))) {
range = (orte_regex_range_t*)itm2;
if (1 == range->cnt) {
opal_asprintf(&tmp2, "%s%u,", tmp, range->vpid);
} else {
opal_asprintf(&tmp2, "%s%u-%u,", tmp, range->vpid, range->vpid + range->cnt - 1);
}
free(tmp);
tmp = tmp2;
OBJ_RELEASE(range);
}
/* replace the final comma */
tmp[strlen(tmp)-1] = ']';
if (NULL != ndreg->suffix) {
/* add in the suffix, if provided */
opal_asprintf(&tmp2, "%s%s", tmp, ndreg->suffix);
free(tmp);
tmp = tmp2;
}
opal_argv_append_nosize(&regexargs, tmp);
free(tmp);
OBJ_RELEASE(ndreg);
}
/* assemble final result */
nodenames = opal_argv_join(regexargs, ',');
/* cleanup */
opal_argv_free(regexargs);
OBJ_DESTRUCT(&nodenms);
/* do the same for the vpids */
tmp = NULL;
while (NULL != (item = opal_list_remove_first(&dvpids))) {
rng = (orte_regex_range_t*)item;
if (1 < rng->cnt) {
if (NULL == tmp) {
opal_asprintf(&tmp, "%u(%u)", rng->vpid, rng->cnt);
} else {
opal_asprintf(&tmp2, "%s,%u(%u)", tmp, rng->vpid, rng->cnt);
free(tmp);
tmp = tmp2;
}
} else {
if (NULL == tmp) {
opal_asprintf(&tmp, "%u", rng->vpid);
} else {
opal_asprintf(&tmp2, "%s,%u", tmp, rng->vpid);
free(tmp);
tmp = tmp2;
}
}
OBJ_RELEASE(rng);
}
OPAL_LIST_DESTRUCT(&dvpids);
/* now concatenate the results into one string */
opal_asprintf(&tmp2, "%s@%s", nodenames, tmp);
free(nodenames);
free(tmp);
*regex = tmp2;
return ORTE_SUCCESS;
}

Просмотреть файл

@ -1,28 +0,0 @@
/*
* Copyright (c) 2016-2018 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#ifndef _MCA_REGX_FwD_H_
#define _MCA_REGX_FwD_H_
#include "orte_config.h"
#include "orte/types.h"
#include "opal/mca/base/base.h"
#include "orte/mca/regx/regx.h"
BEGIN_C_DECLS
ORTE_MODULE_DECLSPEC extern orte_regx_base_component_t mca_regx_fwd_component;
extern orte_regx_base_module_t orte_regx_fwd_module;
END_C_DECLS
#endif /* MCA_REGX_FwD_H_ */

Просмотреть файл

@ -1,44 +0,0 @@
/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */
/*
* Copyright (c) 2016-2018 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#include "orte_config.h"
#include "orte/types.h"
#include "opal/types.h"
#include "opal/util/show_help.h"
#include "orte/mca/regx/regx.h"
#include "regx_fwd.h"
static int component_query(mca_base_module_t **module, int *priority);
/*
* Struct of function pointers and all that to let us be initialized
*/
orte_regx_base_component_t mca_regx_fwd_component = {
.base_version = {
MCA_REGX_BASE_VERSION_1_0_0,
.mca_component_name = "fwd",
MCA_BASE_MAKE_VERSION(component, ORTE_MAJOR_VERSION, ORTE_MINOR_VERSION,
ORTE_RELEASE_VERSION),
.mca_query_component = component_query,
},
.base_data = {
/* The component is checkpoint ready */
MCA_BASE_METADATA_PARAM_CHECKPOINT
},
};
static int component_query(mca_base_module_t **module, int *priority)
{
*module = (mca_base_module_t*)&orte_regx_fwd_module;
*priority = 10;
return ORTE_SUCCESS;
}

Просмотреть файл

@ -1,127 +0,0 @@
/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */
/*
* Copyright (c) 2015-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2015 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2018 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
/** @file:
*
* The Open RTE Personality Framework (regx)
*
* Multi-select framework so that multiple personalities can be
* simultaneously supported
*
*/
#ifndef ORTE_MCA_REGX_H
#define ORTE_MCA_REGX_H
#include "orte_config.h"
#include "orte/types.h"
#include "opal/class/opal_pointer_array.h"
#include "opal/dss/dss_types.h"
#include "orte/mca/mca.h"
#include "orte/runtime/orte_globals.h"
BEGIN_C_DECLS
/*
* regx module functions
*/
#define ORTE_MAX_NODE_PREFIX 50
#define ORTE_CONTIG_NODE_CMD 0x01
#define ORTE_NON_CONTIG_NODE_CMD 0x02
/**
* REGX module functions - the modules are accessed via
* the base stub functions
*/
typedef struct {
opal_list_item_t super;
int ctx;
int nprocs;
int cnt;
} orte_nidmap_regex_t;
ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_nidmap_regex_t);
/* initialize the module - allow it to do whatever one-time
* things it requires */
typedef int (*orte_regx_base_module_init_fn_t)(void);
typedef int (*orte_regx_base_module_nidmap_create_fn_t)(opal_pointer_array_t *pool, char **regex);
typedef int (*orte_regx_base_module_nidmap_parse_fn_t)(char *regex);
typedef int (*orte_regx_base_module_extract_node_names_fn_t)(char *regexp, char ***names);
/* create a regular expression describing the nodes in the
* allocation */
typedef int (*orte_regx_base_module_encode_nodemap_fn_t)(opal_buffer_t *buffer);
/* decode a regular expression created by the encode function
* into the orte_node_pool array */
typedef int (*orte_regx_base_module_decode_daemon_nodemap_fn_t)(opal_buffer_t *buffer);
typedef int (*orte_regx_base_module_build_daemon_nidmap_fn_t)(void);
/* create a regular expression describing the ppn for a job */
typedef int (*orte_regx_base_module_generate_ppn_fn_t)(orte_job_t *jdata, char **ppn);
/* decode the ppn */
typedef int (*orte_regx_base_module_parse_ppn_fn_t)(orte_job_t *jdata, char *ppn);
/* give the component a chance to cleanup */
typedef void (*orte_regx_base_module_finalize_fn_t)(void);
/*
* regx module version 1.0.0
*/
typedef struct {
orte_regx_base_module_init_fn_t init;
orte_regx_base_module_nidmap_create_fn_t nidmap_create;
orte_regx_base_module_nidmap_parse_fn_t nidmap_parse;
orte_regx_base_module_extract_node_names_fn_t extract_node_names;
orte_regx_base_module_encode_nodemap_fn_t encode_nodemap;
orte_regx_base_module_decode_daemon_nodemap_fn_t decode_daemon_nodemap;
orte_regx_base_module_build_daemon_nidmap_fn_t build_daemon_nidmap;
orte_regx_base_module_generate_ppn_fn_t generate_ppn;
orte_regx_base_module_parse_ppn_fn_t parse_ppn;
orte_regx_base_module_finalize_fn_t finalize;
} orte_regx_base_module_t;
ORTE_DECLSPEC extern orte_regx_base_module_t orte_regx;
/*
* regx component
*/
/**
* regx component version 1.0.0
*/
typedef struct {
/** Base MCA structure */
mca_base_component_t base_version;
/** Base MCA data */
mca_base_component_data_t base_data;
} orte_regx_base_component_t;
/**
* Macro for use in components that are of type regx
*/
#define MCA_REGX_BASE_VERSION_1_0_0 \
ORTE_MCA_BASE_VERSION_2_1_0("regx", 1, 0, 0)
END_C_DECLS
#endif

Просмотреть файл

@ -1,36 +0,0 @@
#
# Copyright (c) 2016-2018 Intel, Inc. All rights reserved.
# Copyright (c) 2017 IBM Corporation. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
sources = \
regx_reverse_component.c \
regx_reverse.h \
regx_reverse.c
# Make the output library in this directory, and name it either
# mca_<type>_<name>.la (for DSO builds) or libmca_<type>_<name>.la
# (for static builds).
if MCA_BUILD_orte_regx_reverse_DSO
component_noinst =
component_install = mca_regx_reverse.la
else
component_noinst = libmca_regx_reverse.la
component_install =
endif
mcacomponentdir = $(ortelibdir)
mcacomponent_LTLIBRARIES = $(component_install)
mca_regx_reverse_la_SOURCES = $(sources)
mca_regx_reverse_la_LDFLAGS = -module -avoid-version
mca_regx_reverse_la_LIBADD = $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
noinst_LTLIBRARIES = $(component_noinst)
libmca_regx_reverse_la_SOURCES = $(sources)
libmca_regx_reverse_la_LDFLAGS = -module -avoid-version

Просмотреть файл

@ -1,7 +0,0 @@
#
# owner/status file
# owner: institution that is responsible for this package
# status: e.g. active, maintenance, unmaintained
#
owner: IBM
status: active

Просмотреть файл

@ -1,319 +0,0 @@
/*
* Copyright (c) 2016-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2018 IBM Corporation. All rights reserved.
* Copyright (c) 2018 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*
*/
#include "orte_config.h"
#include "orte/types.h"
#include "opal/types.h"
#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif
#include <ctype.h>
#include "opal/util/argv.h"
#include "opal/util/basename.h"
#include "opal/util/opal_environ.h"
#include "orte/runtime/orte_globals.h"
#include "orte/util/name_fns.h"
#include "orte/util/show_help.h"
#include "orte/mca/errmgr/errmgr.h"
#include "orte/mca/rmaps/base/base.h"
#include "orte/mca/routed/routed.h"
#include "orte/mca/regx/base/base.h"
#include "regx_reverse.h"
static int nidmap_create(opal_pointer_array_t *pool, char **regex);
orte_regx_base_module_t orte_regx_reverse_module = {
.nidmap_create = nidmap_create,
.nidmap_parse = orte_regx_base_nidmap_parse,
.extract_node_names = orte_regx_base_extract_node_names,
.encode_nodemap = orte_regx_base_encode_nodemap,
.decode_daemon_nodemap = orte_regx_base_decode_daemon_nodemap,
.generate_ppn = orte_regx_base_generate_ppn,
.parse_ppn = orte_regx_base_parse_ppn
};
static int nidmap_create(opal_pointer_array_t *pool, char **regex)
{
char *node;
char prefix[ORTE_MAX_NODE_PREFIX];
int i, j, n, len, startnum, nodenum, numdigits;
bool found;
char *suffix, *sfx, *nodenames;
orte_regex_node_t *ndreg;
orte_regex_range_t *range, *rng;
opal_list_t nodenms, dvpids;
opal_list_item_t *item, *itm2;
char **regexargs = NULL, *tmp, *tmp2;
orte_node_t *nptr;
orte_vpid_t vpid;
OBJ_CONSTRUCT(&nodenms, opal_list_t);
OBJ_CONSTRUCT(&dvpids, opal_list_t);
rng = NULL;
for (n=0; n < pool->size; n++) {
if (NULL == (nptr = (orte_node_t*)opal_pointer_array_get_item(pool, n))) {
continue;
}
/* if no daemon has been assigned, then this node is not being used */
if (NULL == nptr->daemon) {
vpid = -1; // indicates no daemon assigned
} else {
vpid = nptr->daemon->name.vpid;
}
/* deal with the daemon vpid - see if it is next in the
* current range */
if (NULL == rng) {
/* just starting */
rng = OBJ_NEW(orte_regex_range_t);
rng->vpid = vpid;
rng->cnt = 1;
opal_list_append(&dvpids, &rng->super);
} else if (UINT32_MAX == vpid) {
if (-1 == rng->vpid) {
rng->cnt++;
} else {
/* need to start another range */
rng = OBJ_NEW(orte_regex_range_t);
rng->vpid = vpid;
rng->cnt = 1;
opal_list_append(&dvpids, &rng->super);
}
} else if (-1 == rng->vpid) {
/* need to start another range */
rng = OBJ_NEW(orte_regex_range_t);
rng->vpid = vpid;
rng->cnt = 1;
opal_list_append(&dvpids, &rng->super);
} else {
/* is this the next in line */
if (vpid == (orte_vpid_t)(rng->vpid + rng->cnt)) {
rng->cnt++;
} else {
/* need to start another range */
rng = OBJ_NEW(orte_regex_range_t);
rng->vpid = vpid;
rng->cnt = 1;
opal_list_append(&dvpids, &rng->super);
}
}
node = nptr->name;
opal_output_verbose(5, orte_regx_base_framework.framework_output,
"%s PROCESS NODE <%s>",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
node);
/* determine this node's prefix by looking for first digit char */
len = strlen(node);
startnum = -1;
memset(prefix, 0, ORTE_MAX_NODE_PREFIX);
numdigits = 0;
/* Valid hostname characters are:
* - ascii letters, digits, and the '-' character.
* Determine the prefix in reverse to better support hostnames like:
* c712f6n01, c699c086 where there are sets of digits, and the lowest
* set changes most frequently.
*/
startnum = -1;
memset(prefix, 0, ORTE_MAX_NODE_PREFIX);
numdigits = 0;
for (i=len-1; i >= 0; i--) {
// Count all of the digits
if( isdigit(node[i]) ) {
numdigits++;
continue;
}
else {
// At this point everything at and above position 'i' is prefix.
for( j = 0; j <= i; ++j) {
prefix[j] = node[j];
}
if (numdigits) {
startnum = j;
}
break;
}
}
opal_output_verbose(5, orte_regx_base_framework.framework_output,
"%s PROCESS NODE <%s> : reverse / prefix \"%s\" / numdigits %d",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
node, prefix, numdigits);
if (startnum < 0) {
/* can't compress this name - just add it to the list */
ndreg = OBJ_NEW(orte_regex_node_t);
ndreg->prefix = strdup(node);
opal_list_append(&nodenms, &ndreg->super);
continue;
}
/* convert the digits and get any suffix */
nodenum = strtol(&node[startnum], &sfx, 10);
if (NULL != sfx) {
suffix = strdup(sfx);
} else {
suffix = NULL;
}
/* is this node name already on our list? */
found = false;
if (0 != opal_list_get_size(&nodenms)) {
ndreg = (orte_regex_node_t*)opal_list_get_last(&nodenms);
if ((0 < strlen(prefix) && NULL == ndreg->prefix) ||
(0 == strlen(prefix) && NULL != ndreg->prefix) ||
(0 < strlen(prefix) && NULL != ndreg->prefix &&
0 != strcmp(prefix, ndreg->prefix)) ||
(NULL == suffix && NULL != ndreg->suffix) ||
(NULL != suffix && NULL == ndreg->suffix) ||
(NULL != suffix && NULL != ndreg->suffix &&
0 != strcmp(suffix, ndreg->suffix)) ||
(numdigits != ndreg->num_digits)) {
found = false;
} else {
/* found a match - flag it */
found = true;
}
}
if (found) {
/* get the last range on this nodeid - we do this
* to preserve order
*/
range = (orte_regex_range_t*)opal_list_get_last(&ndreg->ranges);
if (NULL == range) {
/* first range for this nodeid */
range = OBJ_NEW(orte_regex_range_t);
range->vpid = nodenum;
range->cnt = 1;
opal_list_append(&ndreg->ranges, &range->super);
/* see if the node number is out of sequence */
} else if (nodenum != (range->vpid + range->cnt)) {
/* start a new range */
range = OBJ_NEW(orte_regex_range_t);
range->vpid = nodenum;
range->cnt = 1;
opal_list_append(&ndreg->ranges, &range->super);
} else {
/* everything matches - just increment the cnt */
range->cnt++;
}
} else {
/* need to add it */
ndreg = OBJ_NEW(orte_regex_node_t);
if (0 < strlen(prefix)) {
ndreg->prefix = strdup(prefix);
}
if (NULL != suffix) {
ndreg->suffix = strdup(suffix);
}
ndreg->num_digits = numdigits;
opal_list_append(&nodenms, &ndreg->super);
/* record the first range for this nodeid - we took
* care of names we can't compress above
*/
range = OBJ_NEW(orte_regex_range_t);
range->vpid = nodenum;
range->cnt = 1;
opal_list_append(&ndreg->ranges, &range->super);
}
if (NULL != suffix) {
free(suffix);
}
}
/* begin constructing the regular expression */
while (NULL != (item = opal_list_remove_first(&nodenms))) {
ndreg = (orte_regex_node_t*)item;
/* if no ranges, then just add the name */
if (0 == opal_list_get_size(&ndreg->ranges)) {
if (NULL != ndreg->prefix) {
/* solitary node */
opal_asprintf(&tmp, "%s", ndreg->prefix);
opal_argv_append_nosize(&regexargs, tmp);
free(tmp);
}
OBJ_RELEASE(ndreg);
continue;
}
/* start the regex for this nodeid with the prefix */
if (NULL != ndreg->prefix) {
opal_asprintf(&tmp, "%s[%d:", ndreg->prefix, ndreg->num_digits);
} else {
opal_asprintf(&tmp, "[%d:", ndreg->num_digits);
}
/* add the ranges */
while (NULL != (itm2 = opal_list_remove_first(&ndreg->ranges))) {
range = (orte_regex_range_t*)itm2;
if (1 == range->cnt) {
opal_asprintf(&tmp2, "%s%u,", tmp, range->vpid);
} else {
opal_asprintf(&tmp2, "%s%u-%u,", tmp, range->vpid, range->vpid + range->cnt - 1);
}
free(tmp);
tmp = tmp2;
OBJ_RELEASE(range);
}
/* replace the final comma */
tmp[strlen(tmp)-1] = ']';
if (NULL != ndreg->suffix) {
/* add in the suffix, if provided */
opal_asprintf(&tmp2, "%s%s", tmp, ndreg->suffix);
free(tmp);
tmp = tmp2;
}
opal_argv_append_nosize(&regexargs, tmp);
free(tmp);
OBJ_RELEASE(ndreg);
}
/* assemble final result */
nodenames = opal_argv_join(regexargs, ',');
/* cleanup */
opal_argv_free(regexargs);
OBJ_DESTRUCT(&nodenms);
/* do the same for the vpids */
tmp = NULL;
while (NULL != (item = opal_list_remove_first(&dvpids))) {
rng = (orte_regex_range_t*)item;
if (1 < rng->cnt) {
if (NULL == tmp) {
opal_asprintf(&tmp, "%u(%u)", rng->vpid, rng->cnt);
} else {
opal_asprintf(&tmp2, "%s,%u(%u)", tmp, rng->vpid, rng->cnt);
free(tmp);
tmp = tmp2;
}
} else {
if (NULL == tmp) {
opal_asprintf(&tmp, "%u", rng->vpid);
} else {
opal_asprintf(&tmp2, "%s,%u", tmp, rng->vpid);
free(tmp);
tmp = tmp2;
}
}
OBJ_RELEASE(rng);
}
OPAL_LIST_DESTRUCT(&dvpids);
/* now concatenate the results into one string */
opal_asprintf(&tmp2, "%s@%s", nodenames, tmp);
free(nodenames);
free(tmp);
*regex = tmp2;
return ORTE_SUCCESS;
}

Просмотреть файл

@ -1,28 +0,0 @@
/*
* Copyright (c) 2016-2018 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#ifndef _MCA_REGX_REVERSE_H_
#define _MCA_REGX_REVERSE_H_
#include "orte_config.h"
#include "orte/types.h"
#include "opal/mca/base/base.h"
#include "orte/mca/regx/regx.h"
BEGIN_C_DECLS
ORTE_MODULE_DECLSPEC extern orte_regx_base_component_t mca_regx_reverse_component;
extern orte_regx_base_module_t orte_regx_reverse_module;
END_C_DECLS
#endif /* MCA_REGX_ORTE_H_ */

Просмотреть файл

@ -1,44 +0,0 @@
/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */
/*
* Copyright (c) 2016-2018 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#include "orte_config.h"
#include "orte/types.h"
#include "opal/types.h"
#include "opal/util/show_help.h"
#include "orte/mca/regx/regx.h"
#include "regx_reverse.h"
static int component_query(mca_base_module_t **module, int *priority);
/*
* Struct of function pointers and all that to let us be initialized
*/
orte_regx_base_component_t mca_regx_reverse_component = {
.base_version = {
MCA_REGX_BASE_VERSION_1_0_0,
.mca_component_name = "reverse",
MCA_BASE_MAKE_VERSION(component, ORTE_MAJOR_VERSION, ORTE_MINOR_VERSION,
ORTE_RELEASE_VERSION),
.mca_query_component = component_query,
},
.base_data = {
/* The component is checkpoint ready */
MCA_BASE_METADATA_PARAM_CHECKPOINT
},
};
static int component_query(mca_base_module_t **module, int *priority)
{
*module = (mca_base_module_t*)&orte_regx_reverse_module;
*priority = 1;
return ORTE_SUCCESS;
}

Просмотреть файл

@ -12,7 +12,7 @@
* All rights reserved.
* Copyright (c) 2007-2013 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2015-2017 Intel, Inc. All rights reserved.
* Copyright (c) 2015-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2017 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
@ -44,6 +44,7 @@
#include "orte/runtime/orte_globals.h"
#include "orte/runtime/orte_wait.h"
#include "orte/util/name_fns.h"
#include "orte/util/nidmap.h"
#include "orte/util/threads.h"
#include "orte/mca/rml/rml.h"
@ -181,9 +182,8 @@ void orte_rml_base_process_msg(int fd, short flags, void *cbdata)
ORTE_ERROR_LOG(ORTE_ERR_OUT_OF_RESOURCE);
return;
}
assert (NULL != orte_node_regex);
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &orte_node_regex, 1, OPAL_STRING))) {
if (ORTE_SUCCESS != (rc = orte_util_nidmap_create(orte_node_pool, buffer))) {
ORTE_ERROR_LOG(rc);
OBJ_RELEASE(buffer);
return;

Просмотреть файл

@ -14,7 +14,7 @@
* reserved.
* Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved.
* Copyright (c) 2010-2011 Oak Ridge National Labs. All rights reserved.
* Copyright (c) 2014-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2014-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2016-2017 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
@ -54,11 +54,11 @@
#include "opal/runtime/opal.h"
#include "opal/runtime/opal_progress.h"
#include "opal/dss/dss.h"
#include "opal/mca/compress/compress.h"
#include "orte/util/proc_info.h"
#include "orte/util/session_dir.h"
#include "orte/util/name_fns.h"
#include "orte/util/compress.h"
#include "orte/mca/errmgr/errmgr.h"
#include "orte/mca/grpcomm/base/base.h"
@ -639,7 +639,7 @@ void orte_daemon_recv(int status, orte_process_name_t* sender,
free(coprocessors);
}
answer = OBJ_NEW(opal_buffer_t);
if (orte_util_compress_block((uint8_t*)data.base_ptr, data.bytes_used,
if (opal_compress.compress_block((uint8_t*)data.base_ptr, data.bytes_used,
&cmpdata, &cmplen)) {
/* the data was compressed - mark that we compressed it */
flag = 1;

Просмотреть файл

@ -16,7 +16,7 @@
* Copyright (c) 2009 Institut National de Recherche en Informatique
* et Automatique. All rights reserved.
* Copyright (c) 2010 Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2013-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2013-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2015-2017 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
@ -67,22 +67,22 @@
#include "opal/dss/dss.h"
#include "opal/mca/hwloc/hwloc-internal.h"
#include "opal/mca/pmix/pmix.h"
#include "opal/mca/compress/compress.h"
#include "orte/util/show_help.h"
#include "orte/util/proc_info.h"
#include "orte/util/session_dir.h"
#include "orte/util/name_fns.h"
#include "orte/util/nidmap.h"
#include "orte/util/parse_options.h"
#include "orte/mca/rml/base/rml_contact.h"
#include "orte/util/pre_condition_transports.h"
#include "orte/util/compress.h"
#include "orte/util/threads.h"
#include "orte/mca/errmgr/errmgr.h"
#include "orte/mca/ess/ess.h"
#include "orte/mca/grpcomm/grpcomm.h"
#include "orte/mca/grpcomm/base/base.h"
#include "orte/mca/regx/regx.h"
#include "orte/mca/rml/rml.h"
#include "orte/mca/rml/rml_types.h"
#include "orte/mca/odls/odls.h"
@ -221,10 +221,6 @@ opal_cmd_line_init_t orte_cmd_line_opts[] = {
NULL, OPAL_CMD_LINE_TYPE_BOOL,
"Whether to report process bindings to stderr" },
{ "orte_node_regex", '\0', "nodes", "nodes", 1,
NULL, OPAL_CMD_LINE_TYPE_STRING,
"Regular expression defining nodes in system" },
/* End of list */
{ NULL, '\0', NULL, NULL, 0,
NULL, OPAL_CMD_LINE_TYPE_NULL, NULL }
@ -747,7 +743,7 @@ int orte_daemon(int argc, char *argv[])
/* define the target jobid */
target.jobid = ORTE_PROC_MY_NAME->jobid;
if (orte_fwd_mpirun_port || orte_static_ports || NULL != orte_parent_uri) {
if (NULL != orte_parent_uri) {
/* we start by sending to ourselves */
target.vpid = ORTE_PROC_MY_NAME->vpid;
/* since we will be waiting for any children to send us
@ -755,11 +751,9 @@ int orte_daemon(int argc, char *argv[])
* a little time in the launch phase by "warming up" the
* connection to our parent while we wait for our children */
buffer = OBJ_NEW(opal_buffer_t); // zero-byte message
if (NULL == orte_node_regex) {
orte_rml.recv_buffer_nb(ORTE_PROC_MY_PARENT, ORTE_RML_TAG_NODE_REGEX_REPORT,
ORTE_RML_PERSISTENT, node_regex_report, &node_regex_waiting);
node_regex_waiting = true;
}
node_regex_waiting = true;
orte_rml.recv_buffer_nb(ORTE_PROC_MY_PARENT, ORTE_RML_TAG_NODE_REGEX_REPORT,
ORTE_RML_PERSISTENT, node_regex_report, &node_regex_waiting);
if (0 > (ret = orte_rml.send_buffer_nb(orte_mgmt_conduit,
ORTE_PROC_MY_PARENT, buffer,
ORTE_RML_TAG_WARMUP_CONNECTION,
@ -917,7 +911,7 @@ int orte_daemon(int argc, char *argv[])
if (ORTE_SUCCESS != (ret = opal_dss.pack(&data, &opal_hwloc_topology, 1, OPAL_HWLOC_TOPO))) {
ORTE_ERROR_LOG(ret);
}
if (orte_util_compress_block((uint8_t*)data.base_ptr, data.bytes_used,
if (opal_compress.compress_block((uint8_t*)data.base_ptr, data.bytes_used,
&cmpdata, &cmplen)) {
/* the data was compressed - mark that we compressed it */
flag = 1;
@ -1020,10 +1014,6 @@ int orte_daemon(int argc, char *argv[])
i += 2;
}
}
if (NULL != orte_node_regex) {
/* now launch any child daemons of ours */
orte_plm.remote_spawn();
}
}
if (orte_debug_daemons_flag) {
@ -1174,20 +1164,11 @@ static void report_orted() {
static void node_regex_report(int status, orte_process_name_t* sender,
opal_buffer_t *buffer,
orte_rml_tag_t tag, void *cbdata) {
int rc, n=1;
char * regex;
assert(NULL == orte_node_regex);
int rc;
bool * active = (bool *)cbdata;
/* extract the node regex if needed, and update the routing tree */
n = 1;
if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &regex, &n, OPAL_STRING))) {
ORTE_ERROR_LOG(rc);
return;
}
orte_node_regex = regex;
if (ORTE_SUCCESS != (rc = orte_regx.nidmap_parse(orte_node_regex))) {
/* extract the node info if needed, and update the routing tree */
if (ORTE_SUCCESS != (rc = orte_util_decode_nidmap(buffer))) {
ORTE_ERROR_LOG(rc);
return;
}

Просмотреть файл

@ -13,7 +13,7 @@
* Copyright (c) 2009-2010 Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2011-2013 Los Alamos National Security, LLC.
* All rights reserved.
* Copyright (c) 2013-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2013-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2014-2018 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* Copyright (c) 2017 IBM Corporation. All rights reserved.
@ -81,9 +81,7 @@ char *orte_data_server_uri = NULL;
/* ORTE OOB port flags */
bool orte_static_ports = false;
char *orte_oob_static_ports = NULL;
bool orte_standalone_operation = false;
bool orte_fwd_mpirun_port = true;
bool orte_keep_fqdn_hostnames = false;
bool orte_have_fqdn_allocation = false;
@ -159,7 +157,6 @@ char *orte_default_hostfile = NULL;
bool orte_default_hostfile_given = false;
char *orte_rankfile = NULL;
int orte_num_allocated_nodes = 0;
char *orte_node_regex = NULL;
char *orte_default_dash_host = NULL;
/* tool communication controls */

Просмотреть файл

@ -13,7 +13,7 @@
* Copyright (c) 2007-2017 Cisco Systems, Inc. All rights reserved
* Copyright (c) 2011-2013 Los Alamos National Security, LLC.
* All rights reserved.
* Copyright (c) 2013-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2013-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2017 IBM Corporation. All rights reserved.
* Copyright (c) 2017-2018 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
@ -465,9 +465,7 @@ ORTE_DECLSPEC extern char *orte_data_server_uri;
/* ORTE OOB port flags */
ORTE_DECLSPEC extern bool orte_static_ports;
ORTE_DECLSPEC extern char *orte_oob_static_ports;
ORTE_DECLSPEC extern bool orte_standalone_operation;
ORTE_DECLSPEC extern bool orte_fwd_mpirun_port;
/* nodename flags */
ORTE_DECLSPEC extern bool orte_keep_fqdn_hostnames;
@ -543,7 +541,6 @@ ORTE_DECLSPEC extern char *orte_default_hostfile;
ORTE_DECLSPEC extern bool orte_default_hostfile_given;
ORTE_DECLSPEC extern char *orte_rankfile;
ORTE_DECLSPEC extern int orte_num_allocated_nodes;
ORTE_DECLSPEC extern char *orte_node_regex;
ORTE_DECLSPEC extern char *orte_default_dash_host;
/* PMI version control */

Просмотреть файл

@ -13,7 +13,7 @@
* Copyright (c) 2009-2010 Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2012-2013 Los Alamos National Security, LLC.
* All rights reserved
* Copyright (c) 2013-2018 Intel, Inc. All rights reserved.
* Copyright (c) 2013-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2014-2018 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* Copyright (c) 2017 IBM Corporation. All rights reserved.
@ -407,14 +407,6 @@ int orte_register_params(void)
orte_default_dash_host = NULL;
}
/* regex of nodes in system */
orte_node_regex = NULL;
(void) mca_base_var_register ("orte", "orte", NULL, "node_regex",
"Regular expression defining nodes in the system",
MCA_BASE_VAR_TYPE_STRING, NULL, 0, 0,
OPAL_INFO_LVL_9, MCA_BASE_VAR_SCOPE_READONLY,
&orte_node_regex);
/* whether or not to keep FQDN hostnames */
orte_keep_fqdn_hostnames = false;
(void) mca_base_var_register ("orte", "orte", NULL, "keep_fqdn_hostnames",
@ -776,13 +768,6 @@ int orte_register_params(void)
OPAL_INFO_LVL_9, MCA_BASE_VAR_SCOPE_READONLY,
&orte_stack_trace_wait_timeout);
orte_fwd_mpirun_port = false;
(void) mca_base_var_register ("orte", "orte", NULL, "fwd_mpirun_port",
"Forward the port used by mpirun so all daemons will use it",
MCA_BASE_VAR_TYPE_BOOL, NULL, 0, 0,
OPAL_INFO_LVL_9, MCA_BASE_VAR_SCOPE_READONLY,
&orte_fwd_mpirun_port);
/* register the URI of the UNIVERSAL data server */
orte_data_server_uri = NULL;
(void) mca_base_var_register ("orte", "pmix", NULL, "server_uri",

Просмотреть файл

@ -11,7 +11,7 @@
# All rights reserved.
# Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved.
# Copyright (c) 2014 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2014-2018 Intel, Inc. All rights reserved.
# Copyright (c) 2014-2019 Intel, Inc. All rights reserved.
# Copyright (c) 2016 Research Organization for Information Science
# and Technology (RIST). All rights reserved.
# $COPYRIGHT$
@ -58,8 +58,8 @@ headers += \
util/comm/comm.h \
util/attr.h \
util/listener.h \
util/compress.h \
util/threads.h
util/threads.h \
util/nidmap.h
lib@ORTE_LIB_PREFIX@open_rte_la_SOURCES += \
util/error_strings.c \
@ -77,7 +77,7 @@ lib@ORTE_LIB_PREFIX@open_rte_la_SOURCES += \
util/comm/comm.c \
util/attr.c \
util/listener.c \
util/compress.c
util/nidmap.c
# Remove the generated man pages
distclean-local:

Просмотреть файл

@ -1,117 +0,0 @@
/*
* Copyright (c) 2016-2017 Intel, Inc. All rights reserved.
* Copyright (c) 2017 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#include <orte_config.h>
#include <stdlib.h>
#ifdef HAVE_STRING_H
#include <string.h>
#endif
#ifdef HAVE_ZLIB_H
#include <zlib.h>
#endif
#include "opal/util/output.h"
#include "compress.h"
#if OPAL_HAVE_ZLIB
bool orte_util_compress_block(uint8_t *inbytes,
size_t inlen,
uint8_t **outbytes,
size_t *olen)
{
z_stream strm;
size_t len;
uint8_t *tmp;
if (inlen < ORTE_COMPRESS_LIMIT) {
return false;
}
/* set default output */
*outbytes = NULL;
*olen = 0;
/* setup the stream */
memset (&strm, 0, sizeof (strm));
deflateInit (&strm, 9);
/* get an upper bound on the required output storage */
len = deflateBound(&strm, inlen);
if (NULL == (tmp = (uint8_t*)malloc(len))) {
return false;
}
strm.next_in = inbytes;
strm.avail_in = inlen;
/* allocating the upper bound guarantees zlib will
* always successfully compress into the available space */
strm.avail_out = len;
strm.next_out = tmp;
deflate (&strm, Z_FINISH);
deflateEnd (&strm);
*outbytes = tmp;
*olen = len - strm.avail_out;
return true; // we did the compression
}
#else
bool orte_util_compress_block(uint8_t *inbytes,
size_t inlen,
uint8_t **outbytes,
size_t *olen)
{
return false; // we did not compress
}
#endif
#if OPAL_HAVE_ZLIB
bool orte_util_uncompress_block(uint8_t **outbytes, size_t olen,
uint8_t *inbytes, size_t len)
{
uint8_t *dest;
z_stream strm;
/* set the default error answer */
*outbytes = NULL;
/* setting destination to the fully decompressed size */
dest = (uint8_t*)malloc(olen);
if (NULL == dest) {
return false;
}
memset (&strm, 0, sizeof (strm));
if (Z_OK != inflateInit(&strm)) {
free(dest);
return false;
}
strm.avail_in = len;
strm.next_in = inbytes;
strm.avail_out = olen;
strm.next_out = dest;
if (Z_STREAM_END != inflate (&strm, Z_FINISH)) {
opal_output(0, "\tDECOMPRESS FAILED: %s", strm.msg);
}
inflateEnd (&strm);
*outbytes = dest;
return true;
}
#else
bool orte_util_uncompress_block(uint8_t **outbytes, size_t olen,
uint8_t *inbytes, size_t len)
{
return false;
}
#endif

Просмотреть файл

@ -1,53 +0,0 @@
/*
* Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
* University Research and Technology
* Corporation. All rights reserved.
* Copyright (c) 2004-2005 The University of Tennessee and The University
* of Tennessee Research Foundation. All rights
* reserved.
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
* University of Stuttgart. All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
* Copyright (c) 2015-2017 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
/**
* @file
*
* Compress/decompress long data blocks
*/
#ifndef ORTE_COMPRESS_H
#define ORTE_COMPRESS_H
#include <orte_config.h>
BEGIN_C_DECLS
/* define a limit for compression */
#define ORTE_COMPRESS_LIMIT 4096
/**
* Compress a string into a byte object using Zlib
*/
ORTE_DECLSPEC bool orte_util_compress_block(uint8_t *inbytes,
size_t inlen,
uint8_t **outbytes,
size_t *olen);
/**
* Decompress a byte object
*/
ORTE_DECLSPEC bool orte_util_uncompress_block(uint8_t **outbytes, size_t olen,
uint8_t *inbytes, size_t len);
END_C_DECLS
#endif /* ORTE_COMPRESS_H */

793
orte/util/nidmap.c Обычный файл
Просмотреть файл

@ -0,0 +1,793 @@
/*
* Copyright (c) 2016-2019 Intel, Inc. All rights reserved.
* Copyright (c) 2018 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*
*/
#include "orte_config.h"
#include "orte/types.h"
#include "opal/types.h"
#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif
#include <ctype.h>
#include "opal/dss/dss_types.h"
#include "opal/mca/compress/compress.h"
#include "opal/util/argv.h"
#include "orte/mca/errmgr/errmgr.h"
#include "orte/mca/rmaps/rmaps_types.h"
#include "orte/mca/routed/routed.h"
#include "orte/runtime/orte_globals.h"
#include "orte/util/nidmap.h"
int orte_util_nidmap_create(opal_pointer_array_t *pool,
opal_buffer_t *buffer)
{
char *raw = NULL;
uint8_t *vpids=NULL, *flags=NULL, u8;
uint16_t u16;
uint16_t *slots=NULL;
uint32_t u32;
int n, ndaemons, rc, nbytes, nbitmap;
bool compressed;
char **names = NULL, **ranks = NULL;
orte_node_t *nptr;
opal_byte_object_t bo, *boptr;
size_t sz;
/* pack a flag indicating if the HNP was included in the allocation */
if (orte_hnp_is_allocated) {
u8 = 1;
} else {
u8 = 0;
}
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &u8, 1, OPAL_UINT8))) {
ORTE_ERROR_LOG(rc);
return rc;
}
/* pack a flag indicating if we are in a managed allocation */
if (orte_managed_allocation) {
u8 = 1;
} else {
u8 = 0;
}
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &u8, 1, OPAL_UINT8))) {
ORTE_ERROR_LOG(rc);
return rc;
}
/* daemon vpids start from 0 and increase linearly by one
* up to the number of nodes in the system. The vpid is
* a 32-bit value. We don't know how many of the nodes
* in the system have daemons - we may not be using them
* all just yet. However, even the largest systems won't
* have more than a million nodes for quite some time,
* so for now we'll just allocate enough space to hold
* them all. Someone can optimize this further later */
if (256 >= pool->size) {
nbytes = 1;
} else if (65536 >= pool->size) {
nbytes = 2;
} else {
nbytes = 4;
}
vpids = (uint8_t*)malloc(nbytes * pool->size);
/* make room for the number of slots on each node */
slots = (uint16_t*)malloc(sizeof(uint16_t) * pool->size);
/* and for the flags for each node - only need one bit/node */
nbitmap = (pool->size / 8) + 1;
flags = (uint8_t*)calloc(1, nbitmap);
ndaemons = 0;
for (n=0; n < pool->size; n++) {
if (NULL == (nptr = (orte_node_t*)opal_pointer_array_get_item(pool, n))) {
continue;
}
/* add the hostname to the argv */
opal_argv_append_nosize(&names, nptr->name);
/* store the vpid */
if (1 == nbytes) {
if (NULL == nptr->daemon) {
vpids[ndaemons] = UINT8_MAX;
} else {
vpids[ndaemons] = nptr->daemon->name.vpid;
}
} else if (2 == nbytes) {
if (NULL == nptr->daemon) {
u16 = UINT16_MAX;
} else {
u16 = nptr->daemon->name.vpid;
}
memcpy(&vpids[nbytes*ndaemons], &u16, 2);
} else {
if (NULL == nptr->daemon) {
u32 = UINT32_MAX;
} else {
u32 = nptr->daemon->name.vpid;
}
memcpy(&vpids[nbytes*ndaemons], &u32, 4);
}
/* store the number of slots */
slots[n] = nptr->slots;
/* store the flag */
if (ORTE_FLAG_TEST(nptr, ORTE_NODE_FLAG_SLOTS_GIVEN)) {
flags[n/8] |= (1 << (7 - (n % 8)));
}
++ndaemons;
}
/* construct the string of node names for compression */
raw = opal_argv_join(names, ',');
if (opal_compress.compress_block((uint8_t*)raw, strlen(raw)+1,
(uint8_t**)&bo.bytes, &sz)) {
/* mark that this was compressed */
compressed = true;
bo.size = sz;
} else {
/* mark that this was not compressed */
compressed = false;
bo.bytes = (uint8_t*)raw;
bo.size = strlen(raw)+1;
}
/* indicate compression */
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &compressed, 1, OPAL_BOOL))) {
if (compressed) {
free(bo.bytes);
}
goto cleanup;
}
/* if compressed, provide the uncompressed size */
if (compressed) {
sz = strlen(raw)+1;
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &sz, 1, OPAL_SIZE))) {
free(bo.bytes);
goto cleanup;
}
}
/* add the object */
boptr = &bo;
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &boptr, 1, OPAL_BYTE_OBJECT))) {
if (compressed) {
free(bo.bytes);
}
goto cleanup;
}
if (compressed) {
free(bo.bytes);
}
/* compress the vpids */
if (opal_compress.compress_block(vpids, nbytes*ndaemons,
(uint8_t**)&bo.bytes, &sz)) {
/* mark that this was compressed */
compressed = true;
bo.size = sz;
} else {
/* mark that this was not compressed */
compressed = false;
bo.bytes = vpids;
bo.size = nbytes*ndaemons;
}
/* indicate compression */
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &compressed, 1, OPAL_BOOL))) {
if (compressed) {
free(bo.bytes);
}
goto cleanup;
}
/* provide the #bytes/vpid */
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &nbytes, 1, OPAL_INT))) {
if (compressed) {
free(bo.bytes);
}
goto cleanup;
}
/* if compressed, provide the uncompressed size */
if (compressed) {
sz = nbytes*ndaemons;
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &sz, 1, OPAL_SIZE))) {
free(bo.bytes);
goto cleanup;
}
}
/* add the object */
boptr = &bo;
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &boptr, 1, OPAL_BYTE_OBJECT))) {
if (compressed) {
free(bo.bytes);
}
goto cleanup;
}
if (compressed) {
free(bo.bytes);
}
/* compress the slots */
if (opal_compress.compress_block((uint8_t*)slots, sizeof(uint16_t)*ndaemons,
(uint8_t**)&bo.bytes, &sz)) {
/* mark that this was compressed */
compressed = true;
bo.size = sz;
} else {
/* mark that this was not compressed */
compressed = false;
bo.bytes = (uint8_t*)slots;
bo.size = sizeof(uint16_t)*ndaemons;
}
/* indicate compression */
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &compressed, 1, OPAL_BOOL))) {
if (compressed) {
free(bo.bytes);
}
goto cleanup;
}
/* if compressed, provide the uncompressed size */
if (compressed) {
sz = sizeof(uint16_t)*ndaemons;
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &sz, 1, OPAL_SIZE))) {
free(bo.bytes);
goto cleanup;
}
}
/* add the object */
boptr = &bo;
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &boptr, 1, OPAL_BYTE_OBJECT))) {
if (compressed) {
free(bo.bytes);
}
goto cleanup;
}
if (compressed) {
free(bo.bytes);
}
/* compress the flags */
if (opal_compress.compress_block(flags, nbitmap,
(uint8_t**)&bo.bytes, &sz)) {
/* mark that this was compressed */
compressed = true;
bo.size = sz;
} else {
/* mark that this was not compressed */
compressed = false;
bo.bytes = flags;
bo.size = nbitmap;
}
/* indicate compression */
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &compressed, 1, OPAL_BOOL))) {
if (compressed) {
free(bo.bytes);
}
goto cleanup;
}
/* if compressed, provide the uncompressed size */
if (compressed) {
sz = nbitmap;
if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &sz, 1, OPAL_SIZE))) {
free(bo.bytes);
goto cleanup;
}
}
/* add the object */
boptr = &bo;
rc = opal_dss.pack(buffer, &boptr, 1, OPAL_BYTE_OBJECT);
cleanup:
if (NULL != names) {
opal_argv_free(names);
}
if (NULL != raw) {
free(raw);
}
if (NULL != ranks) {
opal_argv_free(ranks);
}
if (NULL != vpids) {
free(vpids);
}
if (NULL != slots) {
free(slots);
}
if (NULL != flags) {
free(flags);
}
return rc;
}
int orte_util_decode_nidmap(opal_buffer_t *buf)
{
uint8_t u8, *vp8 = NULL, *flags = NULL;
uint16_t *vp16 = NULL, *slots = NULL;
uint32_t *vp32 = NULL, vpid;
int cnt, rc, nbytes, n;
bool compressed;
size_t sz;
opal_byte_object_t *boptr;
char *raw = NULL, **names = NULL;
orte_node_t *nd;
orte_job_t *daemons;
orte_proc_t *proc;
orte_topology_t *t;
/* unpack the flag indicating if HNP is in allocation */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &u8, &cnt, OPAL_UINT8))) {
ORTE_ERROR_LOG(rc);
goto cleanup;
}
if (1 == u8) {
orte_hnp_is_allocated = true;
} else {
orte_hnp_is_allocated = false;
}
/* unpack the flag indicating if we are in managed allocation */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &u8, &cnt, OPAL_UINT8))) {
ORTE_ERROR_LOG(rc);
goto cleanup;
}
if (1 == u8) {
orte_managed_allocation = true;
} else {
orte_managed_allocation = false;
}
/* unpack compression flag for node names */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &compressed, &cnt, OPAL_BOOL))) {
ORTE_ERROR_LOG(rc);
goto cleanup;
}
/* if compressed, get the uncompressed size */
if (compressed) {
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &sz, &cnt, OPAL_SIZE))) {
ORTE_ERROR_LOG(rc);
goto cleanup;
}
}
/* unpack the nodename object */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &boptr, &cnt, OPAL_BYTE_OBJECT))) {
ORTE_ERROR_LOG(rc);
goto cleanup;
}
/* if compressed, decompress */
if (compressed) {
if (!opal_compress.decompress_block((uint8_t**)&raw, sz,
boptr->bytes, boptr->size)) {
ORTE_ERROR_LOG(ORTE_ERROR);
if (NULL != boptr->bytes) {
free(boptr->bytes);
}
free(boptr);
rc = ORTE_ERROR;
goto cleanup;
}
} else {
raw = (char*)boptr->bytes;
boptr->bytes = NULL;
boptr->size = 0;
}
if (NULL != boptr->bytes) {
free(boptr->bytes);
}
free(boptr);
names = opal_argv_split(raw, ',');
free(raw);
/* unpack compression flag for daemon vpids */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &compressed, &cnt, OPAL_BOOL))) {
ORTE_ERROR_LOG(rc);
goto cleanup;
}
/* unpack the #bytes/vpid */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &nbytes, &cnt, OPAL_INT))) {
ORTE_ERROR_LOG(rc);
goto cleanup;
}
/* if compressed, get the uncompressed size */
if (compressed) {
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &sz, &cnt, OPAL_SIZE))) {
ORTE_ERROR_LOG(rc);
goto cleanup;
}
}
/* unpack the vpid object */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &boptr, &cnt, OPAL_BYTE_OBJECT))) {
ORTE_ERROR_LOG(rc);
goto cleanup;
}
/* if compressed, decompress */
if (compressed) {
if (!opal_compress.decompress_block((uint8_t**)&vp8, sz,
boptr->bytes, boptr->size)) {
ORTE_ERROR_LOG(ORTE_ERROR);
if (NULL != boptr->bytes) {
free(boptr->bytes);
}
free(boptr);
rc = ORTE_ERROR;
goto cleanup;
}
} else {
vp8 = (uint8_t*)boptr->bytes;
boptr->bytes = NULL;
boptr->size = 0;
}
if (NULL != boptr->bytes) {
free(boptr->bytes);
}
free(boptr);
if (2 == nbytes) {
vp16 = (uint16_t*)vp8;
vp8 = NULL;
} else if (4 == nbytes) {
vp32 = (uint32_t*)vp8;
vp8 = NULL;
}
/* unpack compression flag for slots */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &compressed, &cnt, OPAL_BOOL))) {
ORTE_ERROR_LOG(rc);
goto cleanup;
}
/* if compressed, get the uncompressed size */
if (compressed) {
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &sz, &cnt, OPAL_SIZE))) {
ORTE_ERROR_LOG(rc);
goto cleanup;
}
}
/* unpack the slots object */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &boptr, &cnt, OPAL_BYTE_OBJECT))) {
ORTE_ERROR_LOG(rc);
goto cleanup;
}
/* if compressed, decompress */
if (compressed) {
if (!opal_compress.decompress_block((uint8_t**)&slots, sz,
boptr->bytes, boptr->size)) {
ORTE_ERROR_LOG(ORTE_ERROR);
if (NULL != boptr->bytes) {
free(boptr->bytes);
}
free(boptr);
rc = ORTE_ERROR;
goto cleanup;
}
} else {
slots = (uint16_t*)boptr->bytes;
boptr->bytes = NULL;
boptr->size = 0;
}
if (NULL != boptr->bytes) {
free(boptr->bytes);
}
free(boptr);
/* unpack compression flag for node flags */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &compressed, &cnt, OPAL_BOOL))) {
ORTE_ERROR_LOG(rc);
goto cleanup;
}
/* if compressed, get the uncompressed size */
if (compressed) {
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &sz, &cnt, OPAL_SIZE))) {
ORTE_ERROR_LOG(rc);
goto cleanup;
}
}
/* unpack the node flags object */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &boptr, &cnt, OPAL_BYTE_OBJECT))) {
ORTE_ERROR_LOG(rc);
goto cleanup;
}
/* if compressed, decompress */
if (compressed) {
if (!opal_compress.decompress_block((uint8_t**)&flags, sz,
boptr->bytes, boptr->size)) {
ORTE_ERROR_LOG(ORTE_ERROR);
if (NULL != boptr->bytes) {
free(boptr->bytes);
}
free(boptr);
rc = ORTE_ERROR;
goto cleanup;
}
} else {
flags = (uint8_t*)boptr->bytes;
boptr->bytes = NULL;
boptr->size = 0;
}
if (NULL != boptr->bytes) {
free(boptr->bytes);
}
free(boptr);
/* if we are the HNP, we don't need any of this stuff */
if (ORTE_PROC_IS_HNP) {
goto cleanup;
}
/* get the daemon job object */
daemons = orte_get_job_data_object(ORTE_PROC_MY_NAME->jobid);
/* get our topology */
for (n=0; n < orte_node_topologies->size; n++) {
if (NULL != (t = (orte_topology_t*)opal_pointer_array_get_item(orte_node_topologies, n))) {
break;
}
}
/* create the node pool array - this will include
* _all_ nodes known to the allocation */
for (n=0; NULL != names[n]; n++) {
/* add this name to the pool */
nd = OBJ_NEW(orte_node_t);
nd->name = names[n];
opal_pointer_array_set_item(orte_node_pool, n, nd);
/* set the #slots */
nd->slots = slots[n];
/* set the flags */
if (1 == flags[n]) {
ORTE_FLAG_SET(nd, ORTE_NODE_FLAG_SLOTS_GIVEN);
}
/* set the topology */
#if !OPAL_ENABLE_HETEROGENEOUS_SUPPORT
nd->topology = t;
#endif
/* see if it has a daemon on it */
if (1 == nbytes && UINT8_MAX != vp8[n]) {
vpid = vp8[n];
} else if (2 == nbytes && UINT16_MAX != vp16[n]) {
vpid = vp16[n];
} else if (4 == nbytes && UINT32_MAX != vp32[n]) {
vpid = vp32[n];
} else {
vpid = UINT32_MAX;
}
if (UINT32_MAX != vpid &&
NULL == (proc = (orte_proc_t*)opal_pointer_array_get_item(daemons->procs, vpid))) {
proc = OBJ_NEW(orte_proc_t);
proc->name.jobid = ORTE_PROC_MY_NAME->jobid;
proc->name.vpid = vpid;
proc->state = ORTE_PROC_STATE_RUNNING;
ORTE_FLAG_SET(proc, ORTE_PROC_FLAG_ALIVE);
daemons->num_procs++;
opal_pointer_array_set_item(daemons->procs, proc->name.vpid, proc);
}
nd->index = proc->name.vpid;
OBJ_RETAIN(nd);
proc->node = nd;
OBJ_RETAIN(proc);
nd->daemon = proc;
}
/* update num procs */
if (orte_process_info.num_procs != daemons->num_procs) {
orte_process_info.num_procs = daemons->num_procs;
/* need to update the routing plan */
orte_routed.update_routing_plan(NULL);
}
if (orte_process_info.max_procs < orte_process_info.num_procs) {
orte_process_info.max_procs = orte_process_info.num_procs;
}
cleanup:
return rc;
}
int orte_util_generate_ppn(orte_job_t *jdata,
opal_buffer_t *buf)
{
uint16_t *ppn=NULL;
size_t nbytes;
int rc = ORTE_SUCCESS;
orte_app_idx_t i;
int j, k;
opal_byte_object_t bo, *boptr;
bool compressed;
orte_node_t *nptr;
orte_proc_t *proc;
size_t sz;
/* make room for the number of procs on each node */
nbytes = sizeof(uint16_t) * orte_node_pool->size;
ppn = (uint16_t*)malloc(nbytes);
for (i=0; i < jdata->num_apps; i++) {
/* reset the #procs */
memset(ppn, 0, nbytes);
/* for each app_context, compute the #procs on
* each node of the allocation */
for (j=0; j < orte_node_pool->size; j++) {
if (NULL == (nptr = (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, j))) {
continue;
}
if (NULL == nptr->daemon) {
continue;
}
for (k=0; k < nptr->procs->size; k++) {
if (NULL != (proc = (orte_proc_t*)opal_pointer_array_get_item(nptr->procs, k))) {
if (proc->name.jobid == jdata->jobid) {
++ppn[j];
}
}
}
}
if (opal_compress.compress_block((uint8_t*)ppn, nbytes,
(uint8_t**)&bo.bytes, &sz)) {
/* mark that this was compressed */
compressed = true;
bo.size = sz;
} else {
/* mark that this was not compressed */
compressed = false;
bo.bytes = (uint8_t*)ppn;
bo.size = nbytes;
}
/* indicate compression */
if (ORTE_SUCCESS != (rc = opal_dss.pack(buf, &compressed, 1, OPAL_BOOL))) {
if (compressed) {
free(bo.bytes);
}
goto cleanup;
}
/* if compressed, provide the uncompressed size */
if (compressed) {
sz = nbytes;
if (ORTE_SUCCESS != (rc = opal_dss.pack(buf, &sz, 1, OPAL_SIZE))) {
free(bo.bytes);
goto cleanup;
}
}
/* add the object */
boptr = &bo;
rc = opal_dss.pack(buf, &boptr, 1, OPAL_BYTE_OBJECT);
if (OPAL_SUCCESS != rc) {
break;
}
}
cleanup:
free(ppn);
return rc;
}
int orte_util_decode_ppn(orte_job_t *jdata,
opal_buffer_t *buf)
{
orte_app_idx_t n;
int m, cnt, rc;
opal_byte_object_t *boptr;
bool compressed;
size_t sz;
uint16_t *ppn, k;
orte_node_t *node;
orte_proc_t *proc;
for (n=0; n < jdata->num_apps; n++) {
/* unpack the compression flag */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &compressed, &cnt, OPAL_BOOL))) {
ORTE_ERROR_LOG(rc);
return rc;
}
/* if compressed, unpack the raw size */
if (compressed) {
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &sz, &cnt, OPAL_SIZE))) {
ORTE_ERROR_LOG(rc);
return rc;
}
}
/* unpack the byte object describing this app */
cnt = 1;
if (OPAL_SUCCESS != (rc = opal_dss.unpack(buf, &boptr, &cnt, OPAL_BYTE_OBJECT))) {
ORTE_ERROR_LOG(rc);
return rc;
}
if (ORTE_PROC_IS_HNP) {
/* just discard it */
free(boptr->bytes);
free(boptr);
continue;
}
/* decompress if required */
if (compressed) {
if (!opal_compress.decompress_block((uint8_t**)&ppn, sz,
boptr->bytes, boptr->size)) {
ORTE_ERROR_LOG(ORTE_ERROR);
OBJ_RELEASE(boptr);
return ORTE_ERROR;
}
} else {
ppn = (uint16_t*)boptr->bytes;
boptr->bytes = NULL;
boptr->size = 0;
}
if (NULL != boptr->bytes) {
free(boptr->bytes);
}
free(boptr);
/* cycle thru the node pool */
for (m=0; m < orte_node_pool->size; m++) {
if (NULL == (node = (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, m))) {
continue;
}
if (0 < ppn[m]) {
if (!ORTE_FLAG_TEST(node, ORTE_NODE_FLAG_MAPPED)) {
OBJ_RETAIN(node);
ORTE_FLAG_SET(node, ORTE_NODE_FLAG_MAPPED);
opal_pointer_array_add(jdata->map->nodes, node);
}
/* create a proc object for each one */
for (k=0; k < ppn[m]; k++) {
proc = OBJ_NEW(orte_proc_t);
proc->name.jobid = jdata->jobid;
/* leave the vpid undefined as this will be determined
* later when we do the overall ranking */
proc->app_idx = n;
proc->parent = node->daemon->name.vpid;
OBJ_RETAIN(node);
proc->node = node;
/* flag the proc as ready for launch */
proc->state = ORTE_PROC_STATE_INIT;
opal_pointer_array_add(node->procs, proc);
/* we will add the proc to the jdata array when we
* compute its rank */
}
node->num_procs += ppn[m];
}
}
free(ppn);
}
return ORTE_SUCCESS;
}

43
orte/util/nidmap.h Обычный файл
Просмотреть файл

@ -0,0 +1,43 @@
/*
* Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana
* University Research and Technology
* Corporation. All rights reserved.
* Copyright (c) 2004-2006 The University of Tennessee and The University
* of Tennessee Research Foundation. All rights
* reserved.
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
* University of Stuttgart. All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
* Copyright (c) 2006-2013 Los Alamos National Security, LLC.
* All rights reserved.
* Copyright (c) 2010-2011 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2015-2019 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#ifndef ORTE_NIDMAP_H
#define ORTE_NIDMAP_H
#include "orte_config.h"
#include "opal/class/opal_pointer_array.h"
#include "opal/dss/dss_types.h"
#include "orte/runtime/orte_globals.h"
ORTE_DECLSPEC int orte_util_nidmap_create(opal_pointer_array_t *pool,
opal_buffer_t *buf);
ORTE_DECLSPEC int orte_util_decode_nidmap(opal_buffer_t *buf);
ORTE_DECLSPEC int orte_util_generate_ppn(orte_job_t *jdata,
opal_buffer_t *buf);
ORTE_DECLSPEC int orte_util_decode_ppn(orte_job_t *jdata,
opal_buffer_t *buf);
#endif /* ORTE_NIDMAP_H */