errhandler (should be initializing _errors_throw_exceptions, not
_are_fatal). This bug was not a huge tragedy because the only real
problem is that _are_fatal has the wrong string name with it (because
MPI::Init fixes up the _errors_throw_exceptions later).
This commit was SVN r20458.
we already have them in orte_process_info. Refs trac:1523.
This commit was SVN r19615.
The following Trac tickets were found above:
Ticket 1523 --> https://svn.open-mpi.org/trac/ompi/ticket/1523
help messages so that users only see the message once instead of N
times when their MPI app crashes.
Note that there is a tradeoff here -- we now call malloc in this
particular "show the error" code path. This shouldn't usually be a
problem, because the errors typically displayed through this mechanism
are MPI API argument problems (e.g., sending a negative count to
MPI_SEND), and not memory errors. But such API argument errors could
be a consequence of of a prior memory error, so there's a nonzero
chance that the error failure will fail to print because malloc
failed. In this case, the user can disable help message aggregation
(via the orte_base_want_aggregate MCA parameter) and we'll fall back
to the no-malloc code path (but without aggregation).
Note that we won't aggregate before MPI_INIT or after MPI_FINALIZE.
So if you call an MPI function before MPI_INIT / after MPI_FINALIZE,
you'll still see the error message N times. Nothing we can do about
that; we need ORTE to do the aggregation properly (which is obviously
unavailable before MPI_INIT / after MPI_FINALIZE).
This commit was SVN r19611.
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.
I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.
This commit was SVN r18619.
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
Some MPI C interface files saw some spacing changes to conform to the coding standards of Open MPI.
Changed MPI C interface files to use {{{OPAL_CR_ENTER_LIBRARY()}}} and {{{OPAL_CR_EXIT_LIBRARY()}}} instead of just {{{OPAL_CR_TEST_CHECKPOINT_READY()}}}. This will allow the checkpoint/restart system more flexibility in how it is to behave.
Fixed the configure check for {{{--enable-ft-thread}}} so it has a know dependance on {{{--enable-mpi-thread}}} (and/or {{{--enable-progress-thread}}}).
Added a line for Checkpoint/Restart support to {{{ompi_info}}}.
Added some options to choose at runtime whether or not to use the checkpoint polling thread. By default, if the user asked for it to be compiled in, then it is used. But some users will want the ability to toggle its use at runtime.
There are still some places for improvement, but the feature works correctly. As always with Checkpoint/Restart, it is compiled out unless explicitly asked for at configure time. Further, if it was configured in, then it is not used unless explicitly asked for by the user at runtime.
This commit was SVN r17516.
Add a missing break to the outer switch statement of ompi_errhandler_invoke.
While I'm there, remove a couple of TABs.
This commit was SVN r17489.
The following Trac tickets were found above:
Ticket 1216 --> https://svn.open-mpi.org/trac/ompi/ticket/1216
to *not* use the STL as well as removing the STL use from the error handler
routines. This was removing the STL from the C++ bindings (Solaris has 2
versions of the STL; if OMPI uses one and an MPI application wants to use
another, Bad Things happen).
The main idea is to wrap up the C++ callback function pointers and the user's
extra_state into our own struct that is passed as the extra_state to the C
keyval registration along with the intercept routines in intercepts.cc. When the
C++ intercepts are activated, they unwrap the user's callback and extra state
and call them.
This commit was SVN r17409.
(sometimes after the merge with the ORTE branch), the opal_pointer_array
will became the only pointer_array implementation (the orte_pointer_array
will be removed).
This commit was SVN r17007.
There is a reason that we use the internal type (ompi_file_errhandler_fn) instead of the MPI typedef. When building without MPI-IO support (--disable-mpi-io), the MPI type is not defined, but the internal type IS defined in order to try to keep binary compatibility for apps that don't use MPI-IO.
This commit was SVN r16136.
The following SVN revision numbers were found above:
r16130 --> open-mpi/ompi@cf5a38af5e
* Added Create_errhandler for MPI::File
* Make errors_throw_exceptions a first-class predefined exception
handler, and make it work for Comm, File, and Win
* Deal with error handlers and attributes for Files, Types, and Wins
like we do with Comms - can't just cast the callbacks from C++
signatures to C signatures. Callbacks will then fire with the
C object, not the C++ object. That's bad.
Refs trac:455
This commit was SVN r12945.
The following Trac tickets were found above:
Ticket 455 --> https://svn.open-mpi.org/trac/ompi/ticket/455
add_error_class and add_error_code files. Also fixed the update of the
lastusedcode attribute, all of work according to my tests pretty fine.
Please note: the testcode attached to the bug 683 still reports some bugs. I
am however pretty sure that the testcode is wrong at that points:
- the standard says that the attribute MPI_LASTUSEDCODE has to be updated for
a new error_class or a new error_code. The test currently assumes, that only
the add_error_code call changes the attribute value.
- you have to comment out the two lines 73 and 74 in order to make the
test finish, since these lines check for the error string of non-existent
codes.
- line 126 the error-string of MPI_ERR_ARG is not "invalid argument" but a
little bit more, so the test thinks the output is wrong. So probably the test
has to be update to match the according error string of MPI_ERR_ARG.
Fixes trac:682
This commit was SVN r12913.
The following Trac tickets were found above:
Ticket 682 --> https://svn.open-mpi.org/trac/ompi/ticket/682
This commit fixes several aspects regarding MPI conformance of requests.
* Eliminate the last argument of ompi_errhandler_request_invoke(); we
''always'' want to invoke the back-end exception handler with the
real error code.
* Make it clear in comments that we only invoke the ''first''
exception in a given array of requests, even if there's more than
one request with a non-MPI_SUCCESS value for MPI_ERROR.
* Defer the freeing of requests upon exception in the back-end
functions to MPI_WAIT* and MPI_TEST* until later; the requests are
kept so that we know what handler to invoke when we actually invoke
the exception. After figuring that out, ''then'' we free requests
with pending exceptions on them.
* Clean up return codes from the back-end MPI_TEST* and MPI_WAIT*
functions.
* Slightly modify ompi_errcode_get_mpi_code() to return unity if it
receives an MPI error code (vs. an OMPI error code).
This commit was SVN r12810.
The following Trac tickets were found above:
Ticket 659 --> https://svn.open-mpi.org/trac/ompi/ticket/659
- we have to be able to attach a string to an error class, not just to an
error code
- according to MPI-2 the attribute MPI_LASTUSEDCODE has to be updated
everytime you add a new code or a new class. Thus, you have to have single
list for both.
Thus, we got rid of the error_class structure. In the error-code structure, we
can distinguish whether we are dealing with an error code or an error class by
looking at the err->code element of the structure. In case its value is
MPI_UNDEFINED, the according entry is a class, else it is an error code. All
predefined error codes have the code and the class field set to the same
value.
The test MPI_Add_error_class1 passes now.
Fixes trac:418
This commit was SVN r12764.
The following Trac tickets were found above:
Ticket 418 --> https://svn.open-mpi.org/trac/ompi/ticket/418
* For MPI_TEST, MPI_TESTANY, MPI_WAIT, and MPI_WAITANY (i.e., the
TEST/WAIT functions that return up to exactly one completed
request), return the actual error code.
* For MPI_TESTALL, MPI_TESTSOME, MPI_WAITALL, MPI_WAITSOME, (i.e.,
the TEST/WAIT functions that can return more than one completed
request), return MPI_ERR_IN_STATUS.
This commit was SVN r12355.
The following Trac tickets were found above:
Ticket 549 --> https://svn.open-mpi.org/trac/ompi/ticket/549
* Create a new request type: NOOP (described below)
* For all MPI_*_INIT functions, OBJ_NEW an ompi_request_t and set its
type to NOOP
* Ensure that the NOOP requests are OBJ_RELEASE'd when they are done
* MPI_START looks at the request type; if NOOP, just return success. If
not, call the PML start() function
* MPI_STARTALL always pass the entire array of requests back to the PML
(see next point)
* Make the PMLs only process PML requests (i.e., ignore/skip anything
that isn't of type PML -- such as the NOOP requests)
* Add a little more param error checking in STARTALL
This commit was SVN r12338.
The following Trac tickets were found above:
Ticket 529 --> https://svn.open-mpi.org/trac/ompi/ticket/529
"this is bogus" kind of answer. Passing in bad error codes should
only happen in erroneous sections of the OMPI code base, but still,
it's far more social to print a message saying, "hey, you messed up!"
rather than seg faulting.
Reviewed by Edgar.
This commit was SVN r12295.
- move files out of toplevel include/ and etc/, moving it into the
sub-projects
- rather than including config headers with <project>/include,
have them as <project>
- require all headers to be included with a project prefix, with
the exception of the config headers ({opal,orte,ompi}_config.h
mpi.h, and mpif.h)
This commit was SVN r8985.
complete, but stable enough that it will have no impact on general development,
so into the trunk it goes. Changes in this commit include:
- Remove the --with option for disabling MPI-2 onesided support. It
complicated code, and has no real reason for existing
- add a framework osc (OneSided Communication) for encapsulating
all the MPI-2 onesided functionality
- Modify the MPI interface functions for the MPI-2 onesided chapter
to properly call the underlying framework and do the required
error checking
- Created an osc component pt2pt, which is layered over the BML/BTL
for communication (although it also uses the PML for long message
transfers). Currently, all support functions, all communication
functions (Put, Get, Accumulate), and the Fence synchronization
function are implemented. The PWSC active synchronization
functions and Lock/Unlock passive synchronization functions are
still not implemented
This commit was SVN r8836.
originally suggested by Ralf Wildenhues, to try to speed autogen, configure,
and make (and possibly even make install). Use automake's include directive
to drastically reduce the number of Makefile files (although the number of
Makefile.am files is the same - most are just included in a top-level
Makefile.am). Also use an Automake SUBDIRs feature to eliminate the
dynamic-mca tree, which was no longer really needed. This makes adding
a framework easier (since you don't have to remember the dynamic-mca
tree) and makes building faster (as make doesn't have to recurse through
the dynamic-mca tree)
This commit was SVN r7777.
AM_INIT_AUTOMAKE, instead of the deprecated version.
* Work around dumbness in modern AC_INIT that requires the version
number to be set at autoconf time (instead of at configure time, as
it was before). Set the version number, minus the subversion r number,
at autoconf time. Override the internal variables to include the r
number (if needed) at configure time. Basically, the right thing
should always happen. The only place it might not is the version
reported as part of configure --help will not have an r number.
* Since AM_INIT_AUTOMAKE taks a list of options, no need to specify
them in all the Makefile.am files.
* Addes support for subdir-objects, meaning that object files are put
in the directory containing source files, even if the Makefile.am is
in another directory. This should start making it feasible to
reduce the number of Makefile.am files we have in the tree, which
will greatly reduce the time to run autogen and configure.
This commit was SVN r7211.