This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
/*
|
|
|
|
* Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
|
|
|
|
* University Research and Technology
|
|
|
|
* Corporation. All rights reserved.
|
|
|
|
* Copyright (c) 2004-2006 The University of Tennessee and The University
|
|
|
|
* of Tennessee Research Foundation. All rights
|
|
|
|
* reserved.
|
|
|
|
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
|
|
|
* University of Stuttgart. All rights reserved.
|
|
|
|
* Copyright (c) 2004-2005 The Regents of the University of California.
|
|
|
|
* All rights reserved.
|
|
|
|
* Copyright (c) 2008 Cisco Systems, Inc. All rights reserved.
|
|
|
|
* $COPYRIGHT$
|
|
|
|
*
|
|
|
|
* Additional copyrights may follow
|
|
|
|
*
|
|
|
|
* $HEADER$
|
|
|
|
*/
|
|
|
|
#include "orte_config.h"
|
|
|
|
#include "orte/types.h"
|
|
|
|
#include "orte/constants.h"
|
|
|
|
|
|
|
|
#include <stdio.h>
|
|
|
|
#include <string.h>
|
|
|
|
#include <time.h>
|
|
|
|
|
|
|
|
#include "opal/util/show_help.h"
|
|
|
|
#include "opal/util/output.h"
|
|
|
|
#include "opal/util/printf.h"
|
|
|
|
#include "opal/dss/dss.h"
|
|
|
|
#include "opal/mca/filter/filter.h"
|
|
|
|
|
|
|
|
#include "orte/mca/errmgr/errmgr.h"
|
|
|
|
#include "orte/mca/rml/rml.h"
|
|
|
|
#include "orte/util/name_fns.h"
|
|
|
|
#include "orte/runtime/orte_globals.h"
|
|
|
|
|
|
|
|
#include "orte/util/output.h"
|
|
|
|
|
|
|
|
/* A bunch of wrappers around lower-level OPAL functions -- need no
|
|
|
|
ORTE infrastructure */
|
|
|
|
|
|
|
|
int orte_output_reopen(int output_id, opal_output_stream_t *lds)
|
|
|
|
{
|
|
|
|
/* this function just acts as a wrapper around the
|
|
|
|
* corresponding opal_output fn
|
|
|
|
*/
|
|
|
|
return opal_output_reopen(output_id, lds);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool orte_output_switch(int output_id, bool enable)
|
|
|
|
{
|
|
|
|
/* this function just acts as a wrapper around the
|
|
|
|
* corresponding opal_output fn
|
|
|
|
*/
|
|
|
|
return opal_output_switch(output_id, enable);
|
|
|
|
}
|
|
|
|
|
|
|
|
void orte_output_reopen_all(void)
|
|
|
|
{
|
|
|
|
/* this function just acts as a wrapper around the
|
|
|
|
* corresponding opal_output fn
|
|
|
|
*/
|
|
|
|
opal_output_reopen_all();
|
|
|
|
}
|
|
|
|
|
|
|
|
void orte_output_set_verbosity(int output_id, int level)
|
|
|
|
{
|
|
|
|
/* this function just acts as a wrapper around the
|
|
|
|
* corresponding opal_output fn
|
|
|
|
*/
|
|
|
|
opal_output_set_verbosity(output_id, level);
|
|
|
|
}
|
|
|
|
|
|
|
|
int orte_output_get_verbosity(int output_id)
|
|
|
|
{
|
|
|
|
/* this function just acts as a wrapper around the
|
|
|
|
* corresponding opal_output fn
|
|
|
|
*/
|
|
|
|
return opal_output_get_verbosity(output_id);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/************************************************************************/
|
|
|
|
|
|
|
|
/* Section for systems without RML and/or HNP support (e.g., Cray) --
|
|
|
|
just output directly; don't do any fancy RML sending to the HNP. */
|
|
|
|
#if ORTE_DISABLE_FULL_SUPPORT
|
|
|
|
|
|
|
|
static int stderr_stream = -1;
|
|
|
|
|
|
|
|
int orte_output_init(void)
|
|
|
|
{
|
|
|
|
stderr_stream = opal_output_open(NULL);
|
|
|
|
regiester_mca();
|
2008-05-28 13:29:58 +00:00
|
|
|
if (0 == ORTE_PROC_MY_NAME->vpid && orte_help_want_aggregate) {
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
orte_output(stderr_stream, "WARNING: orte_base_help_aggregate was set to true, but this system does not support help message aggregation");
|
|
|
|
}
|
|
|
|
return ORTE_SUCCESS;
|
|
|
|
}
|
|
|
|
|
|
|
|
void orte_output_finalize(void)
|
|
|
|
{
|
|
|
|
opal_output_close(stderr_stream);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
int orte_output_open(opal_output_stream_t *lds, const char *primary_tag, ...)
|
|
|
|
{
|
|
|
|
return opal_output_open(lds);
|
|
|
|
}
|
|
|
|
|
|
|
|
void orte_output(int output_id, const char *format, ...)
|
|
|
|
{
|
|
|
|
/* just call opal_output_vverbose with a verbosity of 0 */
|
|
|
|
va_list arglist;
|
|
|
|
va_start(arglist, format);
|
|
|
|
opal_output_vverbose(0, output_id, format, arglist);
|
|
|
|
va_end(arglist);
|
|
|
|
}
|
|
|
|
|
|
|
|
void orte_output_verbose(int verbose_level, int output_id, const char *format, ...)
|
|
|
|
{
|
|
|
|
/* just call opal_output_verbose with the specified verbosity */
|
|
|
|
va_list arglist;
|
|
|
|
va_start(arglist, format);
|
|
|
|
opal_output_vverbose(verbose_level, output_id, format, arglist);
|
|
|
|
va_end(arglist);
|
|
|
|
}
|
|
|
|
|
|
|
|
void orte_output_close(int output_id)
|
|
|
|
{
|
|
|
|
opal_output_close(output_id);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
int orte_show_help(const char *filename, const char *topic,
|
|
|
|
bool want_error_header, ...)
|
|
|
|
{
|
|
|
|
va_list arglist;
|
|
|
|
char *output;
|
|
|
|
|
|
|
|
va_start(arglist, want_error_header);
|
|
|
|
output = opal_show_help_vstring(filename, topic, want_error_header,
|
|
|
|
arglist);
|
|
|
|
va_end(arglist);
|
|
|
|
|
|
|
|
/* If nothing came back, there's nothing to do */
|
|
|
|
if (NULL == output) {
|
|
|
|
return ORTE_SUCCESS;
|
|
|
|
}
|
|
|
|
|
|
|
|
opal_output(stderr_stream, output);
|
|
|
|
return ORTE_SUCCESS;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#else
|
|
|
|
|
|
|
|
/************************************************************************/
|
|
|
|
|
|
|
|
/* Section for systems that have full RML/HNP support */
|
|
|
|
|
|
|
|
/* defines used solely internal to orte_output */
|
|
|
|
#define ORTE_OUTPUT_OTHER 0x00
|
|
|
|
#define ORTE_OUTPUT_STDOUT 0x01
|
|
|
|
#define ORTE_OUTPUT_STDERR 0x02
|
|
|
|
#define ORTE_OUTPUT_SHOW_HELP 0x04
|
|
|
|
|
|
|
|
#define ORTE_OUTPUT_MAX_TAGS 10
|
|
|
|
|
|
|
|
/* struct to store stream-specific data */
|
|
|
|
typedef struct {
|
|
|
|
opal_object_t super;
|
|
|
|
uint8_t flags;
|
|
|
|
int num_tags;
|
|
|
|
char *tags[ORTE_OUTPUT_MAX_TAGS];
|
|
|
|
} orte_output_stream_t;
|
|
|
|
|
|
|
|
static void orte_output_stream_constructor(orte_output_stream_t *ptr)
|
|
|
|
{
|
|
|
|
ptr->flags = ORTE_OUTPUT_OTHER;
|
|
|
|
ptr->num_tags = 0;
|
|
|
|
memset(ptr->tags, 0, ORTE_OUTPUT_MAX_TAGS*sizeof(char*));
|
|
|
|
}
|
|
|
|
static void orte_output_stream_destructor(orte_output_stream_t *ptr)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i=0; i < ORTE_OUTPUT_MAX_TAGS; i++) {
|
|
|
|
if (NULL != ptr->tags[i]) {
|
|
|
|
free(ptr->tags[i]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
OBJ_CLASS_INSTANCE(orte_output_stream_t, opal_object_t,
|
|
|
|
orte_output_stream_constructor,
|
|
|
|
orte_output_stream_destructor);
|
|
|
|
|
|
|
|
/* List items for holding process names */
|
|
|
|
typedef struct {
|
|
|
|
opal_list_item_t super;
|
|
|
|
/* The process name */
|
|
|
|
orte_process_name_t pnli_name;
|
|
|
|
} process_name_list_item_t;
|
|
|
|
|
|
|
|
static void process_name_list_item_constructor(process_name_list_item_t *obj);
|
|
|
|
OBJ_CLASS_INSTANCE(process_name_list_item_t, opal_list_item_t,
|
|
|
|
process_name_list_item_constructor, NULL);
|
|
|
|
|
|
|
|
/* List items for holding (filename, topic) tuples */
|
|
|
|
typedef struct {
|
|
|
|
opal_list_item_t super;
|
|
|
|
/* The filename */
|
|
|
|
char *tli_filename;
|
|
|
|
/* The topic */
|
|
|
|
char *tli_topic;
|
|
|
|
/* List of process names that have displayed this (filename, topic) */
|
|
|
|
opal_list_t tli_processes;
|
|
|
|
/* Time this message was displayed */
|
|
|
|
time_t tli_time_displayed;
|
|
|
|
/* Count of processes since last display (i.e., "new" processes
|
|
|
|
that have showed this message that have not yet been output) */
|
|
|
|
int tli_count_since_last_display;
|
|
|
|
} tuple_list_item_t;
|
|
|
|
|
|
|
|
static void tuple_list_item_constructor(tuple_list_item_t *obj);
|
|
|
|
static void tuple_list_item_destructor(tuple_list_item_t *obj);
|
|
|
|
OBJ_CLASS_INSTANCE(tuple_list_item_t, opal_list_item_t,
|
|
|
|
tuple_list_item_constructor,
|
|
|
|
tuple_list_item_destructor);
|
|
|
|
|
|
|
|
|
|
|
|
/* List of (filename, topic) tuples that have already been displayed */
|
|
|
|
static opal_list_t abd_tuples;
|
|
|
|
|
|
|
|
/* How long to wait between displaying duplicate show_help notices */
|
|
|
|
static struct timeval show_help_interval = { 5, 0 };
|
|
|
|
|
|
|
|
/* Timer for displaying duplicate help message notices */
|
|
|
|
time_t show_help_time_last_displayed = 0;
|
|
|
|
bool show_help_timer_set = false;
|
|
|
|
static opal_event_t show_help_timer_event;
|
|
|
|
|
|
|
|
/* Local static variables */
|
|
|
|
static int stdout_stream, stderr_stream;
|
|
|
|
static opal_output_stream_t stdout_lds, stderr_lds, orte_output_default;
|
|
|
|
static opal_pointer_array_t orte_output_streams;
|
|
|
|
static bool orte_output_ready = false;
|
2008-05-28 13:58:03 +00:00
|
|
|
static bool suppress_warnings = false;
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
|
|
|
|
static void process_name_list_item_constructor(process_name_list_item_t *obj)
|
|
|
|
{
|
|
|
|
obj->pnli_name = orte_globals_name_invalid;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void tuple_list_item_constructor(tuple_list_item_t *obj)
|
|
|
|
{
|
|
|
|
obj->tli_filename = NULL;
|
|
|
|
obj->tli_topic = NULL;
|
|
|
|
OBJ_CONSTRUCT(&(obj->tli_processes), opal_list_t);
|
|
|
|
obj->tli_time_displayed = time(NULL);
|
|
|
|
obj->tli_count_since_last_display = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void tuple_list_item_destructor(tuple_list_item_t *obj)
|
|
|
|
{
|
|
|
|
opal_list_item_t *item, *next;
|
|
|
|
|
|
|
|
if (NULL != obj->tli_filename) {
|
|
|
|
free(obj->tli_filename);
|
|
|
|
}
|
|
|
|
if (NULL != obj->tli_topic) {
|
|
|
|
free(obj->tli_topic);
|
|
|
|
}
|
|
|
|
for (item = opal_list_get_first(&(obj->tli_processes));
|
|
|
|
opal_list_get_end(&(obj->tli_processes)) != item;
|
|
|
|
item = next) {
|
|
|
|
next = opal_list_get_next(item);
|
|
|
|
opal_list_remove_item(&(obj->tli_processes), item);
|
|
|
|
OBJ_RELEASE(item);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check to see if a given (filename, topic) tuple has been displayed
|
|
|
|
* already. Return ORTE_SUCCESS if so, or ORTE_ERR_NOT_FOUND if not.
|
|
|
|
*
|
|
|
|
* Always return a tuple_list_item_t representing this (filename,
|
|
|
|
* topic) entry in the list of "already been displayed tuples" (if it
|
|
|
|
* wasn't in the list already, this function will create a new entry
|
|
|
|
* in the list and return it).
|
|
|
|
*
|
|
|
|
* Note that a list is not an overly-efficient mechanism for this kind
|
|
|
|
* of data. The assupmtion is that there will only be a small numebr
|
|
|
|
* of (filename, topic) tuples displayed so the storage required will
|
|
|
|
* be fairly small, and linear searches will be fast enough.
|
|
|
|
*/
|
|
|
|
static int get_tli(const char *filename, const char *topic,
|
|
|
|
tuple_list_item_t **tli)
|
|
|
|
{
|
|
|
|
opal_list_item_t *item;
|
|
|
|
|
|
|
|
/* Search the list for a duplicate. */
|
|
|
|
for (item = opal_list_get_first(&abd_tuples);
|
|
|
|
opal_list_get_end(&abd_tuples) != item;
|
|
|
|
item = opal_list_get_next(item)) {
|
|
|
|
(*tli) = (tuple_list_item_t*) item;
|
|
|
|
if (0 == strcmp((*tli)->tli_filename, filename) &&
|
|
|
|
0 == strcmp((*tli)->tli_topic, topic)) {
|
|
|
|
return ORTE_SUCCESS;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Nope, we didn't find it -- make a new one */
|
|
|
|
*tli = OBJ_NEW(tuple_list_item_t);
|
|
|
|
if (NULL == *tli) {
|
|
|
|
return ORTE_ERR_OUT_OF_RESOURCE;
|
|
|
|
}
|
|
|
|
(*tli)->tli_filename = strdup(filename);
|
|
|
|
(*tli)->tli_topic = strdup(topic);
|
|
|
|
opal_list_append(&abd_tuples, &((*tli)->super));
|
|
|
|
return ORTE_ERR_NOT_FOUND;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2008-05-27 19:07:31 +00:00
|
|
|
static void new_stream_tracker(int stream, uint8_t flag, const char *primary_tag, va_list arglist)
|
|
|
|
{
|
|
|
|
orte_output_stream_t *ptr;
|
|
|
|
char *tag;
|
|
|
|
|
|
|
|
ptr = OBJ_NEW(orte_output_stream_t);
|
|
|
|
ptr->flags = flag;
|
|
|
|
ptr->num_tags = 1;
|
|
|
|
ptr->tags[0] = strdup(primary_tag);
|
|
|
|
if (NULL != arglist) {
|
|
|
|
while (NULL != (tag = va_arg(arglist, char*))) {
|
|
|
|
ptr->tags[ptr->num_tags] = strdup(tag);
|
|
|
|
ptr->num_tags++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
opal_pointer_array_set_item(&orte_output_streams, stream, ptr);
|
|
|
|
}
|
|
|
|
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
|
|
|
|
static void output_vverbose(int verbose_level, int output_id,
|
|
|
|
int major_id, int minor_id,
|
|
|
|
const char *format, va_list arglist)
|
|
|
|
{
|
|
|
|
char *output = NULL, *filtered = NULL;
|
|
|
|
opal_buffer_t buf;
|
|
|
|
uint8_t flag;
|
|
|
|
orte_output_stream_t **streams;
|
|
|
|
int rc;
|
|
|
|
static bool am_inside = false;
|
|
|
|
|
|
|
|
if (output_id < 0) {
|
|
|
|
/* to be ignored */
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* If we were called before this subsystem was initialized (or
|
|
|
|
after it was finalized), then just pass it down to
|
|
|
|
opal_output(). What else can we do? Shrug. */
|
|
|
|
if (!orte_output_ready) {
|
|
|
|
opal_output_vverbose(verbose_level, output_id, format, arglist);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Render the string. If we get nothing back, there's nothing to
|
|
|
|
do (e.g., verbose level was too high) */
|
|
|
|
output = opal_output_vstring(verbose_level, output_id, format, arglist);
|
|
|
|
if (NULL == output) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* We got a string back, so run it through the filter */
|
|
|
|
streams = (orte_output_stream_t**)orte_output_streams.addr;
|
|
|
|
filtered = opal_filter.process(output, major_id, minor_id,
|
|
|
|
streams[output_id]->num_tags,
|
|
|
|
streams[output_id]->tags);
|
|
|
|
if (NULL == filtered) {
|
|
|
|
filtered = output;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Per a soon-to-be-filed trac ticket: because this function calls
|
|
|
|
RML send, recursion is possible in two places:
|
|
|
|
|
|
|
|
1. RML send itself calls orte_output()
|
|
|
|
2. RML send can call progress, which might call something which
|
|
|
|
calls orte_output
|
|
|
|
|
|
|
|
So how to avoid the infinite loop? #1 is more of an issue than
|
|
|
|
#2, but it still *could* happen that #2 could cause infinit
|
|
|
|
recursion. It is not practical for RML send (and anything that
|
|
|
|
it calls) to use opal_output instead of orte_output.
|
|
|
|
|
|
|
|
We have some ideas how to avoid the recursion, but for the sake
|
|
|
|
of getting this working, we're just going to opal_output the
|
|
|
|
message for now. Hence, the developer/user will always see the
|
|
|
|
message, but tools who are parsing different channels for the
|
|
|
|
output may see this message on the "wrong channel" (e.g., in
|
|
|
|
the MPI process' stdout instead of the special channel from the
|
|
|
|
HNP).
|
|
|
|
*/
|
|
|
|
if (am_inside) {
|
2008-05-28 13:29:58 +00:00
|
|
|
if (orte_help_want_aggregate) {
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
opal_output(0, "%s orte_output recursion detected!", ORTE_NAME_PRINT(ORTE_PROC_MY_NAME));
|
|
|
|
}
|
|
|
|
opal_output(output_id, filtered);
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
am_inside = true;
|
|
|
|
|
|
|
|
/* if I am the HNP, then I need to just pass this on to the
|
|
|
|
* opal_output_verbose function using the provided stream
|
|
|
|
*/
|
|
|
|
if (orte_process_info.hnp) {
|
|
|
|
OPAL_OUTPUT_VERBOSE((5, orte_debug_output,
|
|
|
|
"%s output to stream %d",
|
|
|
|
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
|
|
|
|
output_id));
|
|
|
|
opal_output(output_id, filtered);
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* If there's other flags besides STDOUT and STDERR set, then also
|
|
|
|
output this locally via opal_output */
|
|
|
|
if ((~(ORTE_OUTPUT_STDOUT | ORTE_OUTPUT_STDERR)) &
|
|
|
|
streams[output_id]->flags) {
|
|
|
|
OPAL_OUTPUT_VERBOSE((5, orte_debug_output,
|
|
|
|
"%s locally output to stream %d",
|
|
|
|
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
|
|
|
|
output_id));
|
|
|
|
/* pass the values to opal_output for local handling */
|
|
|
|
opal_output(output_id, filtered);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* We only relay stdout/stderr to the HNP. Note that it is
|
|
|
|
important to do the RML send last in this function -- do not
|
|
|
|
move it earlier! Putting it last ensures that we keep the same
|
|
|
|
relative ordering of output from local calls to opal_output
|
|
|
|
(e.g., for syslog or file, above) as the RML sends to the
|
|
|
|
HNP. */
|
|
|
|
flag = streams[output_id]->flags;
|
|
|
|
if (ORTE_OUTPUT_STDOUT & flag || ORTE_OUTPUT_STDERR & flag) {
|
|
|
|
OPAL_OUTPUT_VERBOSE((5, orte_debug_output,
|
|
|
|
"%s sending filtered output \'%s\' from stream %d",
|
|
|
|
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
|
|
|
|
filtered, output_id));
|
|
|
|
|
2008-05-28 13:29:58 +00:00
|
|
|
/* If RML is not yet setup, or we haven't yet defined the HNP,
|
|
|
|
* then just output this locally.
|
|
|
|
* What else can we do?
|
|
|
|
*/
|
|
|
|
if (NULL == orte_rml.send_buffer ||
|
|
|
|
ORTE_PROC_MY_HNP->vpid == ORTE_VPID_INVALID) {
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
opal_output(0, filtered);
|
|
|
|
} else {
|
|
|
|
/* setup a buffer to send to the HNP */
|
|
|
|
OBJ_CONSTRUCT(&buf, opal_buffer_t);
|
|
|
|
/* pack a flag indicating the output channel */
|
|
|
|
opal_dss.pack(&buf, &flag, 1, OPAL_UINT8);
|
|
|
|
/* pack the string */
|
|
|
|
opal_dss.pack(&buf, &filtered, 1, OPAL_STRING);
|
|
|
|
/* send to the HNP */
|
|
|
|
if (0 > (rc = orte_rml.send_buffer(ORTE_PROC_MY_HNP, &buf,
|
|
|
|
ORTE_RML_TAG_OUTPUT, 0))) {
|
|
|
|
ORTE_ERROR_LOG(rc);
|
|
|
|
}
|
|
|
|
OBJ_DESTRUCT(&buf);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
cleanup:
|
|
|
|
if (filtered != output && NULL != filtered) {
|
|
|
|
free(filtered);
|
|
|
|
}
|
|
|
|
if (NULL != output) {
|
|
|
|
free(output);
|
|
|
|
}
|
|
|
|
am_inside = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void show_accumulated_duplicates(int fd, short event, void *context)
|
|
|
|
{
|
|
|
|
opal_list_item_t *item;
|
|
|
|
time_t now = time(NULL);
|
|
|
|
tuple_list_item_t *tli;
|
|
|
|
|
|
|
|
/* Loop through all the messages we've displayed and see if any
|
|
|
|
processes have sent duplicates that have not yet been displayed
|
|
|
|
yet */
|
|
|
|
for (item = opal_list_get_first(&abd_tuples);
|
|
|
|
opal_list_get_end(&abd_tuples) != item;
|
|
|
|
item = opal_list_get_next(item)) {
|
|
|
|
tli = (tuple_list_item_t*) item;
|
|
|
|
if (tli->tli_count_since_last_display > 0) {
|
|
|
|
orte_output(stderr_stream,
|
|
|
|
"%d more process%s sent help message %s / %s",
|
|
|
|
tli->tli_count_since_last_display,
|
|
|
|
(tli->tli_count_since_last_display > 1) ? "es have" : " has",
|
|
|
|
tli->tli_filename, tli->tli_topic);
|
|
|
|
tli->tli_count_since_last_display = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
show_help_time_last_displayed = now;
|
|
|
|
show_help_timer_set = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int show_help(const char *filename, const char *topic,
|
|
|
|
const char *output, orte_process_name_t *sender)
|
|
|
|
{
|
|
|
|
int rc;
|
|
|
|
tuple_list_item_t *tli;
|
|
|
|
process_name_list_item_t *pnli;
|
|
|
|
time_t now = time(NULL);
|
|
|
|
|
|
|
|
/* If we're aggregating, check for duplicates. Otherwise, don't
|
|
|
|
track duplicates at all and always display the message. */
|
2008-05-28 13:29:58 +00:00
|
|
|
if (orte_output_ready && orte_help_want_aggregate) {
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
rc = get_tli(filename, topic, &tli);
|
|
|
|
} else {
|
|
|
|
rc = ORTE_ERR_NOT_FOUND;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Was it already displayed? */
|
|
|
|
if (ORTE_SUCCESS == rc) {
|
|
|
|
/* Yes. But do we want to print anything? That's complicated.
|
|
|
|
|
|
|
|
We always show the first message of a given (filename,
|
|
|
|
topic) tuple as soon as it arrives. But we don't want to
|
|
|
|
show duplicate notices often, because we could get overrun
|
|
|
|
with them. So we want to gather them up and say "We got N
|
|
|
|
duplicates" every once in a while.
|
|
|
|
|
|
|
|
And keep in mind that at termination, we'll unconditionally
|
|
|
|
show all accumulated duplicate notices.
|
|
|
|
|
|
|
|
A simple scheme is as follows:
|
|
|
|
- when the first of a (filename, topic) tuple arrives
|
|
|
|
- print the message
|
|
|
|
- if a timer is not set, set T=now
|
|
|
|
- when a duplicate (filename, topic) tuple arrives
|
|
|
|
- if now>(T+5) and timer is not set (due to
|
|
|
|
non-pre-emptiveness of our libevent, a timer *could* be
|
|
|
|
set!)
|
|
|
|
- print all accumulated duplicates
|
|
|
|
- reset T=now
|
|
|
|
- else if a timer was not set, set the timer for T+5
|
|
|
|
- else if a timer was set, do nothing (just wait)
|
|
|
|
- set T=now when the timer expires
|
|
|
|
*/
|
|
|
|
++tli->tli_count_since_last_display;
|
|
|
|
if (now > show_help_time_last_displayed + 5 && !show_help_timer_set) {
|
|
|
|
show_accumulated_duplicates(0, 0, NULL);
|
|
|
|
} else if (!show_help_timer_set) {
|
|
|
|
opal_evtimer_set(&show_help_timer_event,
|
|
|
|
show_accumulated_duplicates, NULL);
|
|
|
|
opal_evtimer_add(&show_help_timer_event, &show_help_interval);
|
|
|
|
show_help_timer_set = true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
/* Not already displayed */
|
|
|
|
else if (ORTE_ERR_NOT_FOUND == rc) {
|
|
|
|
orte_output(stderr_stream, output);
|
|
|
|
if (!show_help_timer_set) {
|
|
|
|
show_help_time_last_displayed = now;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
/* Some other error occurred */
|
|
|
|
else {
|
|
|
|
ORTE_ERROR_LOG(rc);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* If we're aggregating, add this process name to the list */
|
2008-05-28 13:29:58 +00:00
|
|
|
if (orte_output_ready && orte_help_want_aggregate) {
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
pnli = OBJ_NEW(process_name_list_item_t);
|
|
|
|
if (NULL == pnli) {
|
|
|
|
rc = ORTE_ERR_OUT_OF_RESOURCE;
|
|
|
|
ORTE_ERROR_LOG(rc);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
pnli->pnli_name = *sender;
|
|
|
|
opal_list_append(&(tli->tli_processes), &(pnli->super));
|
|
|
|
}
|
|
|
|
return ORTE_SUCCESS;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/* Note that this function is called from ess/hnp, so don't make it
|
|
|
|
static */
|
|
|
|
void orte_output_recv_output(int status, orte_process_name_t* sender,
|
|
|
|
opal_buffer_t *buffer, orte_rml_tag_t tag,
|
|
|
|
void* cbdata)
|
|
|
|
{
|
|
|
|
char *output=NULL;
|
|
|
|
char *filename=NULL, *topic=NULL;
|
|
|
|
uint8_t flag;
|
|
|
|
int32_t n;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
OPAL_OUTPUT_VERBOSE((5, orte_debug_output,
|
|
|
|
"%s got output from sender %s",
|
|
|
|
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
|
|
|
|
ORTE_NAME_PRINT(sender)));
|
|
|
|
|
|
|
|
/* unpack the flag indicating the output stream */
|
|
|
|
n = 1;
|
|
|
|
if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &flag, &n, OPAL_UINT8))) {
|
|
|
|
ORTE_ERROR_LOG(rc);
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* if the flag is from show_help... */
|
|
|
|
if (ORTE_OUTPUT_SHOW_HELP & flag) {
|
|
|
|
OPAL_OUTPUT_VERBOSE((5, orte_debug_output,
|
|
|
|
"%s got output from show_help",
|
|
|
|
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
|
|
|
|
|
|
|
|
/* unpack the filename of the show_help text file */
|
|
|
|
n = 1;
|
|
|
|
if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &filename, &n, OPAL_STRING))) {
|
|
|
|
ORTE_ERROR_LOG(rc);
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
/* unpack the topic tag */
|
|
|
|
n = 1;
|
|
|
|
if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &topic, &n, OPAL_STRING))) {
|
|
|
|
ORTE_ERROR_LOG(rc);
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
/* unpack the resulting string */
|
|
|
|
n = 1;
|
|
|
|
if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &output, &n, OPAL_STRING))) {
|
|
|
|
ORTE_ERROR_LOG(rc);
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Send it to show_help */
|
|
|
|
rc = show_help(filename, topic, output, sender);
|
|
|
|
} else {
|
|
|
|
OPAL_OUTPUT_VERBOSE((5, orte_debug_output,
|
|
|
|
"%s got output from stdout/err",
|
|
|
|
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
|
|
|
|
/* this must be from stdout or stderr - get the string */
|
|
|
|
n = 1;
|
|
|
|
if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &output, &n, OPAL_STRING))) {
|
|
|
|
ORTE_ERROR_LOG(rc);
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
/* output via appropriate channel */
|
|
|
|
if (ORTE_OUTPUT_STDOUT & flag) {
|
|
|
|
OPAL_OUTPUT_VERBOSE((5, orte_debug_output,
|
|
|
|
"%s output %s to stdout",
|
|
|
|
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
|
|
|
|
output));
|
|
|
|
opal_output(stdout_stream, output);
|
|
|
|
} else {
|
|
|
|
OPAL_OUTPUT_VERBOSE((5, orte_debug_output,
|
|
|
|
"%s output %s to stderr",
|
|
|
|
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
|
|
|
|
output));
|
|
|
|
opal_output(stderr_stream, output);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
cleanup:
|
|
|
|
if (NULL != output) {
|
|
|
|
free(output);
|
|
|
|
}
|
|
|
|
if (NULL != filename) {
|
|
|
|
free(filename);
|
|
|
|
}
|
|
|
|
if (NULL != topic) {
|
|
|
|
free(topic);
|
|
|
|
}
|
|
|
|
/* reissue the recv */
|
|
|
|
rc = orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD, ORTE_RML_TAG_OUTPUT,
|
|
|
|
ORTE_RML_NON_PERSISTENT, orte_output_recv_output, NULL);
|
|
|
|
if (rc != ORTE_SUCCESS && rc != ORTE_ERR_NOT_IMPLEMENTED) {
|
|
|
|
ORTE_ERROR_LOG(rc);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
int orte_output_init(void)
|
|
|
|
{
|
|
|
|
OPAL_OUTPUT_VERBOSE((5, orte_debug_output, "orte_output init"));
|
|
|
|
|
|
|
|
/* define the default stream that has everything off */
|
|
|
|
OBJ_CONSTRUCT(&orte_output_default, opal_output_stream_t);
|
|
|
|
|
|
|
|
/* Show help duplicate detection */
|
|
|
|
OBJ_CONSTRUCT(&abd_tuples, opal_list_t);
|
|
|
|
|
|
|
|
/* setup arrays to track what output streams are
|
|
|
|
* going to stdout and/or stderr, and to track
|
|
|
|
* tags for possible filtering
|
|
|
|
*/
|
|
|
|
OBJ_CONSTRUCT(&orte_output_streams, opal_pointer_array_t);
|
|
|
|
/* reserve places for 50 streams - the array will
|
|
|
|
* resize above that if required
|
|
|
|
*/
|
|
|
|
opal_pointer_array_init(&orte_output_streams, 50, INT_MAX, 10);
|
|
|
|
/* initialize the 0 position of the array as this
|
|
|
|
* corresponds to the automatically-opened stderr
|
|
|
|
* stream of opal_output
|
|
|
|
*/
|
2008-05-27 19:07:31 +00:00
|
|
|
new_stream_tracker(0, ORTE_OUTPUT_STDERR, "STDERR", NULL);
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
|
|
|
|
/* if we are on the HNP, we need to open
|
|
|
|
* dedicated orte_output streams for stdout/stderr
|
|
|
|
* for our use so that we can control their behavior
|
|
|
|
*/
|
|
|
|
if (orte_process_info.hnp) {
|
|
|
|
/* setup stdout stream - we construct our own
|
|
|
|
* stream object so we can control the behavior
|
|
|
|
* for outputting stuff from remote procs
|
|
|
|
*/
|
|
|
|
OBJ_CONSTRUCT(&stdout_lds, opal_output_stream_t);
|
|
|
|
/* deliver to stdout only */
|
|
|
|
stdout_lds.lds_want_stdout = true;
|
|
|
|
stdout_stream = opal_output_open(&stdout_lds);
|
2008-05-27 19:07:31 +00:00
|
|
|
new_stream_tracker(stdout_stream, ORTE_OUTPUT_STDOUT, "STDOUT", NULL);
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
/* setup stderr stream - we construct our own
|
|
|
|
* stream object so we can control the behavior
|
|
|
|
*/
|
|
|
|
OBJ_CONSTRUCT(&stderr_lds, opal_output_stream_t);
|
|
|
|
/* deliver to stderr only */
|
|
|
|
stderr_lds.lds_want_stderr = true;
|
|
|
|
/* we filter the stderr */
|
|
|
|
stderr_lds.lds_filter_flags = OPAL_OUTPUT_FILTER_STDERR;
|
|
|
|
stderr_stream = opal_output_open(&stderr_lds);
|
2008-05-27 19:07:31 +00:00
|
|
|
new_stream_tracker(stderr_stream, ORTE_OUTPUT_STDERR, "STDERR", NULL);
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
orte_output_ready = true;
|
|
|
|
return ORTE_SUCCESS;
|
|
|
|
}
|
|
|
|
|
|
|
|
void orte_output_finalize(void)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
orte_output_stream_t **streams;
|
2008-05-16 14:32:42 +00:00
|
|
|
|
|
|
|
if (!orte_output_ready) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
/* Shutdown show_help, showing final messages */
|
|
|
|
if (orte_process_info.hnp) {
|
|
|
|
show_accumulated_duplicates(0, 0, NULL);
|
|
|
|
}
|
|
|
|
OBJ_DESTRUCT(&abd_tuples);
|
|
|
|
if (show_help_timer_set) {
|
|
|
|
opal_evtimer_del(&show_help_timer_event);
|
|
|
|
}
|
|
|
|
|
2008-05-20 01:34:55 +00:00
|
|
|
orte_output_ready = false;
|
|
|
|
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
/* if we are the HNP, cancel the recv */
|
|
|
|
if (orte_process_info.hnp) {
|
2008-05-20 01:34:55 +00:00
|
|
|
OBJ_DESTRUCT(&orte_output_default);
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
orte_rml.recv_cancel(ORTE_NAME_WILDCARD, ORTE_RML_TAG_OUTPUT);
|
|
|
|
/* close our output streams */
|
|
|
|
opal_output_close(stdout_stream);
|
|
|
|
opal_output_close(stderr_stream);
|
|
|
|
OBJ_DESTRUCT(&stdout_lds);
|
|
|
|
OBJ_DESTRUCT(&stderr_lds);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* if we are not the HNP, we just need to
|
|
|
|
* cleanup our tracking array
|
2008-05-27 15:15:55 +00:00
|
|
|
*
|
|
|
|
* NOTE: we specifically do NOT call opal_output_close
|
|
|
|
* on these streams! It may be that additional output
|
|
|
|
* will be called after we close orte_output, and we
|
|
|
|
* need to retain an ability to get those messages out.
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
*/
|
|
|
|
streams = (orte_output_stream_t**)orte_output_streams.addr;
|
|
|
|
for (i=0; i < orte_output_streams.size; i++) {
|
|
|
|
if (NULL != streams[i]) {
|
|
|
|
OBJ_RELEASE(streams[i]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
OBJ_DESTRUCT(&orte_output_streams);
|
|
|
|
}
|
|
|
|
|
2008-05-27 20:13:04 +00:00
|
|
|
/* NOTE: HOW WE HANDLE THE ERROR CASE OF SOMEONE
|
|
|
|
* CALLING AN ORTE_OUTPUT FUNCTION PRIOR TO CALLING
|
|
|
|
* ORTE_OUTPUT_INIT
|
|
|
|
*
|
|
|
|
* if we are not finalizing, then this was called prior
|
|
|
|
* to the normal init. Despite multiple attempts, we
|
|
|
|
* haven't really found a good solution to this problem.
|
|
|
|
* We could call orte_output_init to setup our own
|
|
|
|
* internal tracking system, but that won't open up
|
|
|
|
* an associated opal_output stream for this output_id!
|
|
|
|
* Accordingly, opal_output will simply ignore anything
|
|
|
|
* being output to output_id - with the caller none-the-wiser
|
|
|
|
* that this is happening!
|
|
|
|
*
|
|
|
|
* After discussion, RHC and JMS decided that orte_output_init
|
|
|
|
* comes -so- early in the orte_init procedure that the only
|
|
|
|
* way this error can occur is for someone to call orte_output
|
|
|
|
* PRIOR to calling orte_init. This is an obvious error, so the
|
|
|
|
* main concern here is to avoid segfaulting. We therefore print
|
|
|
|
* out an error message so the caller knows what happened, use
|
|
|
|
* opal_output to let them see the error message (if possible),
|
|
|
|
* and do nothing else
|
|
|
|
*/
|
|
|
|
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
int orte_output_open(opal_output_stream_t *lds, const char *primary_tag, ...)
|
|
|
|
{
|
|
|
|
int stream;
|
|
|
|
uint8_t flag = ORTE_OUTPUT_OTHER;
|
|
|
|
va_list arglist;
|
|
|
|
|
2008-05-16 14:32:42 +00:00
|
|
|
if (!orte_output_ready) {
|
2008-05-27 20:13:04 +00:00
|
|
|
/* see above discussion on how we handle this error */
|
|
|
|
fprintf(stderr, "A call was made to orte_output_open %s with primary tag %s\n",
|
|
|
|
orte_finalizing ? "during or after calling orte_finalize" : "prior to calling orte_init",
|
|
|
|
primary_tag);
|
|
|
|
return ORTE_ERROR;
|
2008-05-16 14:32:42 +00:00
|
|
|
}
|
|
|
|
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
/* if we are the HNP, this function just acts as
|
|
|
|
* a wrapper around the corresponding opal_output fn
|
|
|
|
*/
|
|
|
|
if (orte_process_info.hnp) {
|
|
|
|
stream = opal_output_open(lds);
|
|
|
|
OPAL_OUTPUT_VERBOSE((5, orte_debug_output, "HNP opened stream %d", stream));
|
|
|
|
goto track;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* if we not the HNP, then we need to open the stream
|
|
|
|
* and also record whether or not it is sending
|
|
|
|
* output to stdout/stderr
|
|
|
|
*/
|
|
|
|
if (NULL == lds) {
|
|
|
|
/* we have to ensure that the opal_output stream
|
|
|
|
* doesn't open stdout and stderr, so setup the
|
|
|
|
* stream here and ensure the settings are
|
|
|
|
* correct - otherwise, opal_output will default
|
|
|
|
* the stream to having stderr active!
|
|
|
|
*/
|
|
|
|
lds = &orte_output_default;
|
|
|
|
flag = ORTE_OUTPUT_STDERR;
|
|
|
|
} else {
|
|
|
|
/* does this stream involve stdout? */
|
|
|
|
if (lds->lds_want_stdout) {
|
|
|
|
flag |= ORTE_OUTPUT_STDOUT;
|
|
|
|
lds->lds_want_stdout = false;
|
|
|
|
}
|
|
|
|
/* does it involve stderr? */
|
|
|
|
if (lds->lds_want_stderr) {
|
|
|
|
flag |= ORTE_OUTPUT_STDERR;
|
|
|
|
lds->lds_want_stderr = false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
/* open the stream */
|
|
|
|
stream = opal_output_open(lds);
|
|
|
|
|
|
|
|
OPAL_OUTPUT_VERBOSE((5, orte_debug_output,
|
|
|
|
"%s opened stream %d",
|
|
|
|
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
|
|
|
|
stream));
|
|
|
|
|
|
|
|
track:
|
|
|
|
/* track the settings */
|
|
|
|
va_start(arglist, primary_tag);
|
2008-05-27 19:07:31 +00:00
|
|
|
new_stream_tracker(stream, flag, primary_tag, arglist);
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
va_end(arglist);
|
|
|
|
|
|
|
|
return stream;
|
|
|
|
}
|
|
|
|
|
|
|
|
void orte_output(int output_id, const char *format, ...)
|
|
|
|
{
|
|
|
|
va_list arglist;
|
2008-05-27 20:13:04 +00:00
|
|
|
char *output;
|
|
|
|
|
2008-05-16 14:32:42 +00:00
|
|
|
if (!orte_output_ready) {
|
2008-05-27 15:15:55 +00:00
|
|
|
/* if we are finalizing, then we have no way to process
|
|
|
|
* this through the orte_output system - just drop it to
|
|
|
|
* opal_output for handling
|
|
|
|
*/
|
|
|
|
if (orte_finalizing) {
|
|
|
|
va_start(arglist, format);
|
|
|
|
opal_output_vverbose(0, output_id, format, arglist);
|
2008-05-16 14:32:42 +00:00
|
|
|
return;
|
2008-05-27 15:15:55 +00:00
|
|
|
} else {
|
2008-05-27 20:13:04 +00:00
|
|
|
/* see above discussion as to how we handle this error */
|
|
|
|
va_start(arglist, format);
|
2008-05-28 13:58:03 +00:00
|
|
|
if (!suppress_warnings) {
|
|
|
|
fprintf(stderr, "A call was made to orte_output prior to calling orte_init\n");
|
|
|
|
fprintf(stderr, "The offending message is:\n");
|
|
|
|
}
|
2008-05-27 20:13:04 +00:00
|
|
|
output = opal_output_vstring(0, 0, format, arglist);
|
|
|
|
fprintf(stderr, "%s\n", (NULL == output) ? "NULL" : output);
|
|
|
|
if (NULL != output) free(output);
|
|
|
|
return;
|
2008-05-16 14:32:42 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* just call output_vverbose with a verbosity of 0 */
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
va_start(arglist, format);
|
|
|
|
output_vverbose(0, output_id, ORTE_PROC_MY_NAME->jobid, ORTE_PROC_MY_NAME->vpid, format, arglist);
|
|
|
|
va_end(arglist);
|
|
|
|
}
|
|
|
|
|
|
|
|
void orte_output_verbose(int verbose_level, int output_id, const char *format, ...)
|
|
|
|
{
|
|
|
|
va_list arglist;
|
2008-05-27 20:13:04 +00:00
|
|
|
char *output;
|
2008-05-16 14:32:42 +00:00
|
|
|
|
|
|
|
if (!orte_output_ready) {
|
2008-05-27 15:15:55 +00:00
|
|
|
/* if we are finalizing, then we have no way to process
|
|
|
|
* this through the orte_output system - just drop it to
|
|
|
|
* opal_output for handling
|
|
|
|
*/
|
|
|
|
if (orte_finalizing) {
|
|
|
|
va_start(arglist, format);
|
|
|
|
opal_output_vverbose(verbose_level, output_id, format, arglist);
|
2008-05-16 14:32:42 +00:00
|
|
|
return;
|
2008-05-27 15:15:55 +00:00
|
|
|
} else {
|
|
|
|
/* if we are not finalizing, then this was called prior
|
2008-05-27 20:13:04 +00:00
|
|
|
* to the normal init - see above discussion as to
|
|
|
|
* how this is being handled
|
2008-05-27 19:07:31 +00:00
|
|
|
*/
|
2008-05-27 20:13:04 +00:00
|
|
|
va_start(arglist, format);
|
2008-05-28 13:58:03 +00:00
|
|
|
if (!suppress_warnings) {
|
|
|
|
fprintf(stderr, "A call was made to orte_output_verbose prior to calling orte_init\n");
|
|
|
|
fprintf(stderr, "The offending message (verbosity=%d) is:\n", verbose_level);
|
|
|
|
}
|
2008-05-27 20:13:04 +00:00
|
|
|
output = opal_output_vstring(0, 0, format, arglist);
|
|
|
|
fprintf(stderr, "%s\n", (NULL == output) ? "NULL" : output);
|
|
|
|
if (NULL != output) free(output);
|
2008-05-16 14:32:42 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* just call output_verbose with the specified verbosity */
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
va_start(arglist, format);
|
|
|
|
output_vverbose(verbose_level, output_id, ORTE_PROC_MY_NAME->jobid, ORTE_PROC_MY_NAME->vpid, format, arglist);
|
|
|
|
va_end(arglist);
|
|
|
|
}
|
|
|
|
|
|
|
|
void orte_output_close(int output_id)
|
|
|
|
{
|
|
|
|
orte_output_stream_t **streams;
|
|
|
|
|
2008-05-27 19:07:31 +00:00
|
|
|
if (!orte_output_ready && !orte_finalizing) {
|
|
|
|
/* if we are finalizing, then we really don't want
|
|
|
|
* to init this system - otherwise, this was called prior
|
2008-05-27 20:13:04 +00:00
|
|
|
* to the normal init, see above discussion as to
|
|
|
|
* how this was handled
|
2008-05-27 19:07:31 +00:00
|
|
|
*/
|
2008-05-27 20:13:04 +00:00
|
|
|
fprintf(stderr, "A call was made to orte_output_close prior to calling orte_init\n");
|
|
|
|
return;
|
2008-05-16 14:32:42 +00:00
|
|
|
}
|
|
|
|
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
/* cleanout the stream settings */
|
|
|
|
streams = (orte_output_stream_t**)orte_output_streams.addr;
|
|
|
|
if (output_id >= 0 && NULL != streams[output_id]) {
|
|
|
|
OBJ_RELEASE(streams[output_id]);
|
|
|
|
}
|
|
|
|
opal_output_close(output_id);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
int orte_show_help(const char *filename, const char *topic,
|
|
|
|
bool want_error_header, ...)
|
|
|
|
{
|
|
|
|
int rc = ORTE_SUCCESS;
|
|
|
|
va_list arglist;
|
|
|
|
char *output;
|
|
|
|
|
|
|
|
va_start(arglist, want_error_header);
|
|
|
|
output = opal_show_help_vstring(filename, topic, want_error_header,
|
|
|
|
arglist);
|
|
|
|
va_end(arglist);
|
|
|
|
|
|
|
|
/* If nothing came back, there's nothing to do */
|
|
|
|
if (NULL == output) {
|
|
|
|
return ORTE_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2008-05-28 13:58:03 +00:00
|
|
|
if (!orte_output_ready) {
|
|
|
|
/* if we are finalizing, then we have no way to process
|
|
|
|
* this through the orte_output system - just drop it to
|
|
|
|
* opal_show_help for handling
|
|
|
|
*
|
|
|
|
* If we are not finalizing, then this is probably a show_help
|
|
|
|
* stemming from either a cmd-line request to display the usage
|
|
|
|
* message, or a show_help related to a user error. In either case,
|
|
|
|
* we can't do anything but just call opal_show_help
|
|
|
|
*
|
|
|
|
* Ensure we suppress the orte_output warnings for this case, then
|
|
|
|
* re-enable them when we are done
|
|
|
|
*/
|
|
|
|
suppress_warnings = true;
|
|
|
|
rc = show_help(filename, topic, output, ORTE_PROC_MY_NAME);
|
|
|
|
suppress_warnings = false;
|
|
|
|
goto CLEANUP;
|
|
|
|
}
|
|
|
|
|
2008-05-28 13:29:58 +00:00
|
|
|
/* if we are the HNP, or the RML has not yet been setup,
|
|
|
|
* or we don't yet know our HNP, then all we can do
|
|
|
|
* is process this locally
|
|
|
|
*/
|
|
|
|
if (orte_process_info.hnp ||
|
|
|
|
NULL == orte_rml.send_buffer ||
|
|
|
|
ORTE_PROC_MY_HNP->vpid == ORTE_VPID_INVALID) {
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
rc = show_help(filename, topic, output, ORTE_PROC_MY_NAME);
|
|
|
|
}
|
|
|
|
|
2008-05-28 13:29:58 +00:00
|
|
|
/* otherwise, we relay the output message to
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
* the HNP for processing
|
|
|
|
*/
|
|
|
|
else {
|
|
|
|
opal_buffer_t buf;
|
|
|
|
uint8_t flag = ORTE_OUTPUT_SHOW_HELP;
|
|
|
|
static bool am_inside = false;
|
|
|
|
|
|
|
|
/* JMS Note that we *may* have a similar recursion situation
|
|
|
|
as with output_vverbose(), above. Need to think about this
|
|
|
|
properly, but put a safeguard in here for sure for the time
|
|
|
|
being. */
|
|
|
|
if (am_inside) {
|
|
|
|
rc = show_help(filename, topic, output, ORTE_PROC_MY_NAME);
|
|
|
|
} else {
|
|
|
|
am_inside = true;
|
|
|
|
|
|
|
|
/* build the message to the HNP */
|
|
|
|
OBJ_CONSTRUCT(&buf, opal_buffer_t);
|
|
|
|
/* pack the flag indicating this is from show_help */
|
|
|
|
opal_dss.pack(&buf, &flag, 1, OPAL_UINT8);
|
|
|
|
/* pack the filename of the show_help text file */
|
|
|
|
opal_dss.pack(&buf, &filename, 1, OPAL_STRING);
|
|
|
|
/* pack the topic tag */
|
|
|
|
opal_dss.pack(&buf, &topic, 1, OPAL_STRING);
|
|
|
|
/* pack the resulting string */
|
|
|
|
opal_dss.pack(&buf, &output, 1, OPAL_STRING);
|
|
|
|
/* send it to the HNP */
|
|
|
|
if (0 > (rc = orte_rml.send_buffer(ORTE_PROC_MY_HNP, &buf, ORTE_RML_TAG_OUTPUT, 0))) {
|
|
|
|
ORTE_ERROR_LOG(rc);
|
|
|
|
}
|
|
|
|
OBJ_DESTRUCT(&buf);
|
|
|
|
am_inside = false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-05-28 13:58:03 +00:00
|
|
|
CLEANUP:
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
|
|
|
free(output);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
#endif /* ORTE_DISABLE_FULL_SUPPORT */
|
|
|
|
|