pondering about this problem, we came to the conclusion that the best approach
is to keep what we had before (i.e. the original approach).
The main reason for this is being nice with tool developers. In the current
incarnation, they can either catch the Fortran calls or the C calls. If they
provide both, then they will have to figure out how to cope with the double
calls (as your example highlight).
Here is the behavior Open MPI will stick too:
Fortran MPI -> C MPI
Fortran PMPI -> C MPI
However, the is another possible approach. This might avoid the double calls
while preserving the tool writers friendliness. This possible approach will do:
Fortran MPI -> C MPI
Fortran PMPI -> C PMPI
^
Unfortunately, we will have to heavily modify all files in the Fortran
interface layer in order to support this approach.
This commit was SVN r20079.
1. fix a bug in pml_ob1_recvreq/sendreq.c, buffer was made defined where the request has already been released.
2. complete memchecker support for collective functions.
3. change the wrongly spelled function name of memchecker, i.e. '*_isaddressible' should be '*_isaddressable'
This commit was SVN r20043.
I'm unable to split it in two parts, my patch and Edgar's one. So I just update
copyright information for both of us.
What this patch do:
- it use the unexpected queue create by commit r19562 to dispatch the
unexpected message to the right communicator (once this communicator
is created and initialized).
- delay the PML comm_add until we have the context_id for the new communicator.
- only do the PML comm_add on processes that really belong to the new
communicator. Please read the lengthy comment in the source code for the
reason behind this.
This commit was SVN r19929.
The following SVN revision numbers were found above:
r19562 --> open-mpi/ompi@acd3406aa7
unconditionally, which can result in a flood of messages to the user
if all MPI processes invoke abort. Additionally, some users were
confused because they saw the MPI_ABORT opal_output() messages from
''some'' MPI processes, but not ''all'' of them (despite the fact that
every MPI process supposedly invoked MPI_ABORT). The reason is that
calling MPI_ABORT triggers ORTE to kill all MPI processes, so it's a
race condition as to whether a) all MPI processes actually invoke
MPI_ABORT, and/or b) whether every process is able to opal_output()
before they are killed.
This commit does two simple things:
* Now use orte_show_help() for the MPI_ABORT message, so they are
aggregated.
* Add a note in the message that calling MPI_ABORT kills all
processes, so you might not see all output, yadda yadda yadda.
This commit was SVN r19735.
This fixes trac:1477.
Help provided by Jeff and Terry.
This commit was SVN r19533.
The following Trac tickets were found above:
Ticket 1477 --> https://svn.open-mpi.org/trac/ompi/ticket/1477
MPI::SEEK_SET and friends, suggested by Doug Gregor. This way allows
users to utilize SEEK_SET in a case statement, which they could not do
with our previous method.
This commit was SVN r19494.
way".
Don't modify coords in the top-level API function because coords is an
IN variable. Instead, as Nysal noted, the real cause of the problem
was a missing ! down in topo_base_cart_rank.c. Put a comment down in
topo_base_cart_rank.c explaining what's going on so that the code is
not so cryptic.
Refs trac:1363.
This commit was SVN r19487.
The following Trac tickets were found above:
Ticket 1363 --> https://svn.open-mpi.org/trac/ompi/ticket/1363
* Various changes to enable 0-dimensional cartesian communicators:
* Set various mtc_* members to NULL when there are 0 dimensions (and
don't bother trying to memcpy these arrays when duplicating the
communicator -- because they're NULL)
* adjust topo_base_cart_sub to correctly handle 0 dimensions
(simplified it a bit)
* adjust a few error codes to return ERR_OUT_OF_RESOURCE
* adjust error checking of CART_CREATE, CART_RANK
* Allow MPI_GRAPH_CREATE to accept 0 == nnodes.
* Bump reported MPI version in mpi.h to 2.1
This commit was SVN r19461.
The following Trac tickets were found above:
Ticket 1236 --> https://svn.open-mpi.org/trac/ompi/ticket/1236
for the F90 type create functions to the requirements of MPI 2.1 standard.
Advice to implementors. An application may often repeat a call to
MPI_TYPE_CREATE_F90_xxxx with the same combination of (xxxx,p,r).
The application is not allowed to free the returned predefined, unnamed
datatype handles. To prevent the creation of a potentially huge amount of
handles, the MPI implementation should return the same datatype handle for
the same (REAL/COMPLEX/INTEGER,p,r) combination. Checking for the
combination (p,r) in the preceding call to MPI_TYPE_CREATE_F90_xxxx and
using a hash-table to find formerly generated handles should limit the
overhead of finding a previously generated datatype with same combination
of (xxxx,p,r). (End of advice to implementors.)
This commit fixes trac:1239, and #712.
This commit was SVN r19458.
The following Trac tickets were found above:
Ticket 1239 --> https://svn.open-mpi.org/trac/ompi/ticket/1239
Possibly fixes CID 417
* ensure to initialize inner members upon default constructors using
the same syntax across all classes
* remove some member variables that aren't used anymore
* "initialize" the inner MPI_Status in the default constructor for
MPI::Status by calling the default constructor mpi_status(). This
may or may not silence CID 417; we'll see what happens in
subsequent Coverity Prevent runs.
This commit was SVN r19228.
* Make the creation of the build dir for the man pages a bit more
robust (thanks to suggestions from Ralf W.).
* Only distribute the .Xin files, not the .X man pages themselves.
* Make the .X files depend on opal_config.h so that if you re-run
configure and change opal_config.h (e.g., a new version), the man
pages should get rebuilt.
* Man pages are now cleaned with "distclean", not "maintainer-clean".
* Fix a typo in opal_crs.7in.
* Udpate make_dist_tarball to update "date" in the VERSION file.
* Make make_dist_tarball a bit friendlier to hg checkouts.
This commit was SVN r19219.
Revise the scope precedence in the MPI_Publish, Unpublish, and Lookup functions. If a global server was specified and is available, then default to using it for all three functions. If not, then default to using local scope.
If an info_key was provided, then it takes preference. We always follow the user's direction - this change only impacts the scope ordering if the user -doesn't- tell us the order to use.
This commit was SVN r19146.
versions, dates and build names.
Fixes trac:1387
Big thanks to Jeff and Brian for help and oversight.
This commit was SVN r19120.
The following Trac tickets were found above:
Ticket 1387 --> https://svn.open-mpi.org/trac/ompi/ticket/1387
"make distclean". It's not clear whether it's an Automake bug or
whether what I did simply is not supported (I've got pending mail into
Ralf W. asking about it). The short version is that during "make
distclean", ompi/mpi/f77/Makefile would rm -rf ompi/mpi/f77/.deps.
But ompi/Makefile still include's some .Plo files from that directory,
so Bad Things happened when "make distclean" unrolled from the
ompi/mpi/f77 dir back up to the ompi/ dir.
So I went with George's original suggestion and moved the f77 "base"
files in question into a new directory: ompi/mpi/f77/base and put a
Makefile.include in there. That way, this directory is not traversed
twice by distclean, and .deps is only removed when it is supposed to
be. Maybe we'll be able to do it a little better someday, but that's
the way it is now.
I'll check this with a fresh checkout once this is committed to SVN as
well; some of these kinds of problems don't show up until you do a
build from a completely fresh SVN checkout.
This commit was SVN r19054.
The following SVN revision numbers were found above:
r19040 --> open-mpi/ompi@9f4d4c4312
are properly linked against libmpi.la.
This required a little creative AM usage, inspired by discussion on
OMPI devel list:
* Make a new ompi/mpi/f77/Makefile_f77base.include; effectively move
the building of the f77 "base" glue stuff (libmpi_f77base.la) into
this Makefile and away from ompi/mpi/f77/Makefile.am. The sources
in question require some specific CPPFLAGS, so we couldn't just add
the raw sources into libmpi_la_SOURCES, unfortunately.
* Include this new Makefile in the top-level ompi/Makefile.am
* The libmpi_f77base.la LT convenience library was already sucked
into libmpi.la; breaking it out into its own Makefile allows us
to build it earlier and therefore complete buidling libmpi.la
earlier.
* Side effect: the ompi/mpi/Makefile.am is now mostly unnecessary; it
no longer specifies a SUBDIRS for each of the bindings directories
to traverse into (since they are now in the top-level SUBDIRS). As
such, the man pages are now also now included in the top-level
ompi/Makefile.am.
The end of the result is that libmpi.la -- including a few sources
from mpi/f77 -- is fully built before the C++, F77, and F90 bindings
are built. Therefore, the C++, F77, and F90 bindings libraries can
all link against libmpi.la.
This commit was SVN r19040.
The following Trac tickets were found above:
Ticket 1409 --> https://svn.open-mpi.org/trac/ompi/ticket/1409
of strings. We mostly did the Right Things already; I simplified the
code a bit and also had us not write to more characters in the C
bindings than we're supposed to (per language in the MPI-2.1 spec).
Fixes trac:1238.
This commit was SVN r18705.
The following Trac tickets were found above:
Ticket 1238 --> https://svn.open-mpi.org/trac/ompi/ticket/1238
* Put the variable in the MPI namespace; keeps it safely segregated
from user apps
* Need to actually "extern" the variable to make the compiler not
complain that the variable is never referenced
This commit was SVN r18693.
the char string ident in the C++ library be non-static so that other
places can see it. This makes the C++ library version string
analogous to all the other version strings.
This commit was SVN r18679.
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.
I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.
This commit was SVN r18619.
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
{{{
svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch .
}}}
Contains:
* Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart.
* Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff
* Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P.
* Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry
* Some other sundry cleanup items all dealing with C/R functionality in the trunk.
This commit was SVN r18241.
Fix the ompi-server -h cmd line option so it actually tells you something!
Add two new testing codes to the orte/test/mpi area: accept and connect.
This commit was SVN r18176.
to undo 1822).
The verification of recvcount==0 and rank = root was braking
inter-communicator scatter, since the root (root==MPI_ROOT) might very well
have recvcount=0. The same fix has been applied to gather.c just the other way
round.
Fixes the bug reported on the mainling list by Martin Audet. If there is a
1.2.7 this fix might be worthwhile porting it over.
Please note, that while the test works now for basic and for inter, we get a
0byte malloc warning from the inter module, which we still have to fix in a
separate patch.
This commit was SVN r18123.
Event var_deref_op: Variable "requests" tracked as NULL was
dereferenced.
Only check requests[i] for NULL, if requests is != NULL itself.
This commit was SVN r17973.
Event uninit_use_in_call: Using uninitialized value "tag" in call to
function "(ompi_dpm).connect_accept" and others
The tag is set and used in get_rport only on root...
This commit was SVN r17972.
Clarify the setting of send_first in the mpi bindings (trivial, i know, but helpful)
Remove the extra xcast of child contact info to the parent job.
This commit was SVN r17952.
Some MPI C interface files saw some spacing changes to conform to the coding standards of Open MPI.
Changed MPI C interface files to use {{{OPAL_CR_ENTER_LIBRARY()}}} and {{{OPAL_CR_EXIT_LIBRARY()}}} instead of just {{{OPAL_CR_TEST_CHECKPOINT_READY()}}}. This will allow the checkpoint/restart system more flexibility in how it is to behave.
Fixed the configure check for {{{--enable-ft-thread}}} so it has a know dependance on {{{--enable-mpi-thread}}} (and/or {{{--enable-progress-thread}}}).
Added a line for Checkpoint/Restart support to {{{ompi_info}}}.
Added some options to choose at runtime whether or not to use the checkpoint polling thread. By default, if the user asked for it to be compiled in, then it is used. But some users will want the ability to toggle its use at runtime.
There are still some places for improvement, but the feature works correctly. As always with Checkpoint/Restart, it is compiled out unless explicitly asked for at configure time. Further, if it was configured in, then it is not used unless explicitly asked for by the user at runtime.
This commit was SVN r17516.
implement specific function, thereby
removing bogus requirement on valgrind/valgrind.h
dough...
- Call specific function runindebugger() before
doing expensive checks on each component of struct.
- Get rid of void* warnings..
This commit was SVN r17438.
to *not* use the STL as well as removing the STL use from the error handler
routines. This was removing the STL from the C++ bindings (Solaris has 2
versions of the STL; if OMPI uses one and an MPI application wants to use
another, Bad Things happen).
The main idea is to wrap up the C++ callback function pointers and the user's
extra_state into our own struct that is passed as the extra_state to the C
keyval registration along with the intercept routines in intercepts.cc. When the
C++ intercepts are activated, they unwrap the user's callback and extra state
and call them.
This commit was SVN r17409.
structure up into the MPI::Grequest class (rather than being a struct
definition outside of any class) to be similar to how the keyval
intercept data structures are organized. Fix one minor compiler
warning in the process.
This commit was SVN r17171.
use the STL. This is the first step in removing the STL from the C++
bindings (Solaris has 2 versions of the STL; if OMPI uses one and an
MPI application wants to use another, Bad Things happen).
The main idea is to wrap up the C++ callback function pointers and the
user's extra_state into our own struct that is passed as the
extra_state to the C keyval registration along with the intercept
routines in intercepts.cc. When the C++ intercepts are activated,
they unwrap the user's callback and extra state and call them.
It got a little more complicated than that, however:
* I realized that we were returning errors back from
Comm::create_keyval() incorrectly, so I fixed that.
* Instead of using STL maps to store associations, we now use an
opal_list_t which has to be guaranteed to be initialized correctly
and only once in a multi-threaded environment.
* Because of whackyness in the C++ bindings, it is possible to call
Comm::Create_keyval with C callbacks (!). If both registered
callbacks are C functions, then ensure to avoid all the C++
machinery.
This commit was SVN r17125.
bindings with Dan: it's no longer necessary since we're firmly tied to
Open MPI (I'm not sure it was ever necessary, actually...).
This commit was SVN r17017.
(sometimes after the merge with the ORTE branch), the opal_pointer_array
will became the only pointer_array implementation (the orte_pointer_array
will be removed).
This commit was SVN r17007.
added there on my last commit was wrong. This variable should be included
only once, and here is the right way of doing:
- if we have weak symbols we compile each file once, so the variable should
[always] get included.
- if we don't have weak symbols, then each file will get compiled multiple
times (if profiling is enabled). In this case include the variable only
when we build the generic layer (not the profile one).
This commit was SVN r16950.
only defined when we build the normal version (not in the profiling compilation step).
Make sure the conversion_null function compil in all cases.
This commit was SVN r16908.
This commit brings over all the work from the /tmp-public/datarep
branch. See commits r16855, r16859, r16860 for the highlights of what
was done.
This commit was SVN r16891.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r16855
r16859
r16860
The following Trac tickets were found above:
Ticket 1029 --> https://svn.open-mpi.org/trac/ompi/ticket/1029
* Update MPI_WTICK / MPI_WTIME man pages:
* Fix C++ declarations
* Note that we may use better than gettimeofday() on some platforms
* Add "MPI_WTIME support" ("options:mpi-wtime") flag in ompi_info
output indicating whether we use "native" or "gettimeofday" for
MPI_WTIME
This commit was SVN r16774.
methods (in order of precedence):
1. #pragma ident <ident string> (e.g., Intel and Sun)
1. #ident <ident string> (e.g., GCC)
1. static const char ident[] = <ident string> (all others)
By default, the ident string used is the standard Open MPI version string. Only
the following libraries will get the embedded version strings (e.g., DSOs will
not):
* libmpi.so
* libmpi_cxx.so
* libmpi_f77.so
* libopen-pal.so
* libopen-rte.so
* Added two new configure options:
* `--with-package-name="STRING"` (defaults to "Open MPI username@hostname
Distribution"). `STRING` is displayed by `ompi_info` next to the "Package"
heading.
* `--with-ident-string="STRING"` (defaults to the standard Open MPI version
string - e.g., X.Y.Zr######). `%VERSION%` will expand to the Open MPI
version string if it is supplied to this configure option.
This commit was SVN r16644.
* Fix some missing includes in a few places.
* Add the cr_request() functionality to the BLCR CRS component.
We are now dependent upon the 0.6.* series of BLCR.
* Made the CR notification mechanism a registered function.
This way we can have an OPAL-only version and it can be replaced at
runtime with the ORTE version.
* Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only
CR functionality when the user wants it. Default: Disabled.
* Fix the placement of a checkpoint request check in MPI_Init
* Pull the OPAL notification mechanism into the SnapC framework.
* We no longer fork/exec the 'opal-checkpoint' command for local
checkpointing, the Local coordinator in the orted does this directly.
* The Local and Application coordinator talk together bypassing the OPAL
notifiation mechanism.
* Optimized the Local <-> App Coordinator communication.
* Improved the structure used to track vpid_snapshots in the local coord.
* Fix a race condition in which an application under heavy communication load
may produce an inconsistent global checkpoint.
This commit was SVN r16389.