In case we use memcmp, strlen, strup and friends include <string.h>
Also several constants.h are not included directly
- Let's have mca_topo_base_cart_create return ompi-errors in
ompi/mca/topo/base/topo_base_cart_create.c
This commit was SVN r20773.
Anyway, this is blocking the move: do not include pml.h
if not really needed, aka none of the following used:
mca_pml
MCA_PML_CALL
OMPI_ANY_TAG
OMPI_ANY_SOURCE
OMPI_PROC_NULL
- Notable exceptions (deleting in one header->adding):
- ompi/mca/mtl/psm/
- ompi/mca/osc/rdma/
- ompi/mca/btl/openib/btl_openib_endpoint.c depended on
pml_base_sendreq.h
- Tested on Linux/x86-64, this time including make check
(thanks Jeff and Ralph)
This commit was SVN r20725.
devel list, it ''is'' within in the spirit of MPI to allow
MPI_REQUEST_NULL to be passed to MPI_REQUEST_GET_STATUS. I filed a
ticket proposal with MPI-2.2 to make this officially accepted:
https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/137
Plus, r20537 didn't revert out all of the machinery for allowing
MPI_REQUEST_NULL or inactive requests, anyway. So this commit simply
removes the parameter check that was added in r20537, and we're back
to where we were before this whole conversation. :-)
This commit was SVN r20616.
The following SVN revision numbers were found above:
r20537 --> open-mpi/ompi@38aab37bb3
Often, orte/util/show_help.h is included, although no functionality
is required -- instead, most often opal_output.h, or
orte/mca/rml/rml_types.h
Please see orte_show_help_replacement.sh commited next.
- Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
actually showed two *missing* #include "orte/util/show_help.h"
in orte/mca/odls/base/odls_base_default_fns.c and
in orte/tools/orte-top/orte-top.c
Manually added these.
Let's have MTT the last word.
This commit was SVN r20557.
* New "op" MPI layer framework
* Addition of the MPI_REDUCE_LOCAL proposed function (for MPI-2.2)
= Op framework =
Add new "op" framework in the ompi layer. This framework replaces the
hard-coded MPI_Op back-end functions for (MPI_Op, MPI_Datatype) tuples
for pre-defined MPI_Ops, allowing components and modules to provide
the back-end functions. The intent is that components can be written
to take advantage of hardware acceleration (GPU, FPGA, specialized CPU
instructions, etc.). Similar to other frameworks, components are
intended to be able to discover at run-time if they can be used, and
if so, elect themselves to be selected (or disqualify themselves from
selection if they cannot run). If specialized hardware is not
available, there is a default set of functions that will automatically
be used.
This framework is ''not'' used for user-defined MPI_Ops.
The new op framework is similar to the existing coll framework, in
that the final set of function pointers that are used on any given
intrinsic MPI_Op can be a mixed bag of function pointers, potentially
coming from multiple different op modules. This allows for hardware
that only supports some of the operations, not all of them (e.g., a
GPU that only supports single-precision operations).
All the hard-coded back-end MPI_Op functions for (MPI_Op,
MPI_Datatype) tuples still exist, but unlike coll, they're in the
framework base (vs. being in a separate "basic" component) and are
automatically used if no component is found at runtime that provides a
module with the necessary function pointers.
There is an "example" op component that will hopefully be useful to
those writing meaningful op components. It is currently
.ompi_ignore'd so that it doesn't impinge on other developers (it's
somewhat chatty in terms of opal_output() so that you can tell when
its functions have been invoked). See the README file in the example
op component directory. Developers of new op components are
encouraged to look at the following wiki pages:
https://svn.open-mpi.org/trac/ompi/wiki/devel/Autogenhttps://svn.open-mpi.org/trac/ompi/wiki/devel/CreateComponenthttps://svn.open-mpi.org/trac/ompi/wiki/devel/CreateFramework
= MPI_REDUCE_LOCAL =
Part of the MPI-2.2 proposal listed here:
https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/24
is to add a new function named MPI_REDUCE_LOCAL. It is very easy to
implement, so I added it (also because it makes testing the op
framework pretty easy -- you can do it in serial rather than via
parallel reductions). There's even a man page!
This commit was SVN r20280.
1. fix a bug in pml_ob1_recvreq/sendreq.c, buffer was made defined where the request has already been released.
2. complete memchecker support for collective functions.
3. change the wrongly spelled function name of memchecker, i.e. '*_isaddressible' should be '*_isaddressable'
This commit was SVN r20043.
I'm unable to split it in two parts, my patch and Edgar's one. So I just update
copyright information for both of us.
What this patch do:
- it use the unexpected queue create by commit r19562 to dispatch the
unexpected message to the right communicator (once this communicator
is created and initialized).
- delay the PML comm_add until we have the context_id for the new communicator.
- only do the PML comm_add on processes that really belong to the new
communicator. Please read the lengthy comment in the source code for the
reason behind this.
This commit was SVN r19929.
The following SVN revision numbers were found above:
r19562 --> open-mpi/ompi@acd3406aa7
unconditionally, which can result in a flood of messages to the user
if all MPI processes invoke abort. Additionally, some users were
confused because they saw the MPI_ABORT opal_output() messages from
''some'' MPI processes, but not ''all'' of them (despite the fact that
every MPI process supposedly invoked MPI_ABORT). The reason is that
calling MPI_ABORT triggers ORTE to kill all MPI processes, so it's a
race condition as to whether a) all MPI processes actually invoke
MPI_ABORT, and/or b) whether every process is able to opal_output()
before they are killed.
This commit does two simple things:
* Now use orte_show_help() for the MPI_ABORT message, so they are
aggregated.
* Add a note in the message that calling MPI_ABORT kills all
processes, so you might not see all output, yadda yadda yadda.
This commit was SVN r19735.
way".
Don't modify coords in the top-level API function because coords is an
IN variable. Instead, as Nysal noted, the real cause of the problem
was a missing ! down in topo_base_cart_rank.c. Put a comment down in
topo_base_cart_rank.c explaining what's going on so that the code is
not so cryptic.
Refs trac:1363.
This commit was SVN r19487.
The following Trac tickets were found above:
Ticket 1363 --> https://svn.open-mpi.org/trac/ompi/ticket/1363
* Various changes to enable 0-dimensional cartesian communicators:
* Set various mtc_* members to NULL when there are 0 dimensions (and
don't bother trying to memcpy these arrays when duplicating the
communicator -- because they're NULL)
* adjust topo_base_cart_sub to correctly handle 0 dimensions
(simplified it a bit)
* adjust a few error codes to return ERR_OUT_OF_RESOURCE
* adjust error checking of CART_CREATE, CART_RANK
* Allow MPI_GRAPH_CREATE to accept 0 == nnodes.
* Bump reported MPI version in mpi.h to 2.1
This commit was SVN r19461.
The following Trac tickets were found above:
Ticket 1236 --> https://svn.open-mpi.org/trac/ompi/ticket/1236
for the F90 type create functions to the requirements of MPI 2.1 standard.
Advice to implementors. An application may often repeat a call to
MPI_TYPE_CREATE_F90_xxxx with the same combination of (xxxx,p,r).
The application is not allowed to free the returned predefined, unnamed
datatype handles. To prevent the creation of a potentially huge amount of
handles, the MPI implementation should return the same datatype handle for
the same (REAL/COMPLEX/INTEGER,p,r) combination. Checking for the
combination (p,r) in the preceding call to MPI_TYPE_CREATE_F90_xxxx and
using a hash-table to find formerly generated handles should limit the
overhead of finding a previously generated datatype with same combination
of (xxxx,p,r). (End of advice to implementors.)
This commit fixes trac:1239, and #712.
This commit was SVN r19458.
The following Trac tickets were found above:
Ticket 1239 --> https://svn.open-mpi.org/trac/ompi/ticket/1239
of strings. We mostly did the Right Things already; I simplified the
code a bit and also had us not write to more characters in the C
bindings than we're supposed to (per language in the MPI-2.1 spec).
Fixes trac:1238.
This commit was SVN r18705.
The following Trac tickets were found above:
Ticket 1238 --> https://svn.open-mpi.org/trac/ompi/ticket/1238
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.
I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.
This commit was SVN r18619.
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
{{{
svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch .
}}}
Contains:
* Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart.
* Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff
* Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P.
* Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry
* Some other sundry cleanup items all dealing with C/R functionality in the trunk.
This commit was SVN r18241.
Fix the ompi-server -h cmd line option so it actually tells you something!
Add two new testing codes to the orte/test/mpi area: accept and connect.
This commit was SVN r18176.
to undo 1822).
The verification of recvcount==0 and rank = root was braking
inter-communicator scatter, since the root (root==MPI_ROOT) might very well
have recvcount=0. The same fix has been applied to gather.c just the other way
round.
Fixes the bug reported on the mainling list by Martin Audet. If there is a
1.2.7 this fix might be worthwhile porting it over.
Please note, that while the test works now for basic and for inter, we get a
0byte malloc warning from the inter module, which we still have to fix in a
separate patch.
This commit was SVN r18123.
Event var_deref_op: Variable "requests" tracked as NULL was
dereferenced.
Only check requests[i] for NULL, if requests is != NULL itself.
This commit was SVN r17973.
Event uninit_use_in_call: Using uninitialized value "tag" in call to
function "(ompi_dpm).connect_accept" and others
The tag is set and used in get_rport only on root...
This commit was SVN r17972.
Clarify the setting of send_first in the mpi bindings (trivial, i know, but helpful)
Remove the extra xcast of child contact info to the parent job.
This commit was SVN r17952.
Some MPI C interface files saw some spacing changes to conform to the coding standards of Open MPI.
Changed MPI C interface files to use {{{OPAL_CR_ENTER_LIBRARY()}}} and {{{OPAL_CR_EXIT_LIBRARY()}}} instead of just {{{OPAL_CR_TEST_CHECKPOINT_READY()}}}. This will allow the checkpoint/restart system more flexibility in how it is to behave.
Fixed the configure check for {{{--enable-ft-thread}}} so it has a know dependance on {{{--enable-mpi-thread}}} (and/or {{{--enable-progress-thread}}}).
Added a line for Checkpoint/Restart support to {{{ompi_info}}}.
Added some options to choose at runtime whether or not to use the checkpoint polling thread. By default, if the user asked for it to be compiled in, then it is used. But some users will want the ability to toggle its use at runtime.
There are still some places for improvement, but the feature works correctly. As always with Checkpoint/Restart, it is compiled out unless explicitly asked for at configure time. Further, if it was configured in, then it is not used unless explicitly asked for by the user at runtime.
This commit was SVN r17516.
implement specific function, thereby
removing bogus requirement on valgrind/valgrind.h
dough...
- Call specific function runindebugger() before
doing expensive checks on each component of struct.
- Get rid of void* warnings..
This commit was SVN r17438.
to *not* use the STL as well as removing the STL use from the error handler
routines. This was removing the STL from the C++ bindings (Solaris has 2
versions of the STL; if OMPI uses one and an MPI application wants to use
another, Bad Things happen).
The main idea is to wrap up the C++ callback function pointers and the user's
extra_state into our own struct that is passed as the extra_state to the C
keyval registration along with the intercept routines in intercepts.cc. When the
C++ intercepts are activated, they unwrap the user's callback and extra state
and call them.
This commit was SVN r17409.
(sometimes after the merge with the ORTE branch), the opal_pointer_array
will became the only pointer_array implementation (the orte_pointer_array
will be removed).
This commit was SVN r17007.
* Update MPI_WTICK / MPI_WTIME man pages:
* Fix C++ declarations
* Note that we may use better than gettimeofday() on some platforms
* Add "MPI_WTIME support" ("options:mpi-wtime") flag in ompi_info
output indicating whether we use "native" or "gettimeofday" for
MPI_WTIME
This commit was SVN r16774.
methods (in order of precedence):
1. #pragma ident <ident string> (e.g., Intel and Sun)
1. #ident <ident string> (e.g., GCC)
1. static const char ident[] = <ident string> (all others)
By default, the ident string used is the standard Open MPI version string. Only
the following libraries will get the embedded version strings (e.g., DSOs will
not):
* libmpi.so
* libmpi_cxx.so
* libmpi_f77.so
* libopen-pal.so
* libopen-rte.so
* Added two new configure options:
* `--with-package-name="STRING"` (defaults to "Open MPI username@hostname
Distribution"). `STRING` is displayed by `ompi_info` next to the "Package"
heading.
* `--with-ident-string="STRING"` (defaults to the standard Open MPI version
string - e.g., X.Y.Zr######). `%VERSION%` will expand to the Open MPI
version string if it is supplied to this configure option.
This commit was SVN r16644.
* Fix some missing includes in a few places.
* Add the cr_request() functionality to the BLCR CRS component.
We are now dependent upon the 0.6.* series of BLCR.
* Made the CR notification mechanism a registered function.
This way we can have an OPAL-only version and it can be replaced at
runtime with the ORTE version.
* Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only
CR functionality when the user wants it. Default: Disabled.
* Fix the placement of a checkpoint request check in MPI_Init
* Pull the OPAL notification mechanism into the SnapC framework.
* We no longer fork/exec the 'opal-checkpoint' command for local
checkpointing, the Local coordinator in the orted does this directly.
* The Local and Application coordinator talk together bypassing the OPAL
notifiation mechanism.
* Optimized the Local <-> App Coordinator communication.
* Improved the structure used to track vpid_snapshots in the local coord.
* Fix a race condition in which an application under heavy communication load
may produce an inconsistent global checkpoint.
This commit was SVN r16389.
used at nce (up to one unique collective module per collective function).
Matches r15795:15921 of the tmp/bwb-coll-select branch
This commit was SVN r15924.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r15795
r15921
passed to MPI_COMM_FREE, it invoked the error handler on
MPI_COMM_WORLD, not on MPI_COMM_FREE. This commit changes the
behavior: if MPI_COMM_SELF is passed to MPI_COMM_FREE, we invoke the
error handler on MPI_COMM_SELF (not MPI_COMM_WORLD). Fixes trac:1109.
This commit was SVN r15682.
The following Trac tickets were found above:
Ticket 1109 --> https://svn.open-mpi.org/trac/ompi/ticket/1109
- If one wants to use this solution, remember to unload the project 'orte-restart' which is currently not working for Windows.
This commit was SVN r15680.
multi-completion calls. Therefore reorder tests where appropriate.
- Always check for NULL-pointers (flag, index, completed) (yes, we
use them in the underlying ompi_request_* functions.
- Break early, if a NULL-request is found
This commit was SVN r15671.
MPI_FREE_MEM is invoked with NULL, return success.
Fixes trac:1101.
This commit was SVN r15593.
The following Trac tickets were found above:
Ticket 1101 --> https://svn.open-mpi.org/trac/ompi/ticket/1101
development trees since last year (had to wait for some intel tests to
run yesterday, so I finally took the time to finish this work):
* Improve MPI API argument checking by also checking for NULL values
(especially helps when invalid Fortran MPI handles are passed,
because the various MPI_*f2c functions are supposed to return an
"invalid" MPI handle [meaning NULL] when this happens). So now
OMPI will generate an MPI exception rather than a segv.
* Removed a few redundant DATATYPE_NULL checks.
* Also check for some other forms of "invalid" handles (e.g., already
been freed, etc.) in some cases. We could probably be a bit more
stringent in this regard if we really wanted to.
* Change MPI_Get_processor_name to zero out the string up to
MPI_MAX_PROCESSOR_NAME characters, because the MPI spec says that
the string must be at least that long. We were already passing
that length to gethostname(), anyway.
This commit was SVN r14100.
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.
This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.
This commit closes trac:158
More details to follow.
This commit was SVN r14051.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r13912
The following Trac tickets were found above:
Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
test, Sun's darray test, and an internal LANL test code. I would not
assume it will work properly on other codes, as I'm still not sure I
completely understand what the standard says this function is supposed to
do.
Refs trac:65
This commit was SVN r13967.
The following Trac tickets were found above:
Ticket 65 --> https://svn.open-mpi.org/trac/ompi/ticket/65
create a duplicate type, because any duplicate type lose the PREDEFINED flag.
An MPI_LB (respectively MPI_UB) without the PREDEFINED tag is useless, as it's
not the a marker anymore. The solution is to return the same pointer, but once
the reference count has been increased. In order for this to work, I allowed
the destruction to check for the reference count of an object before complaining
about destroying a predefined type.
This fixed ticket #317.
This commit was SVN r13942.