- extendet support for BlueGene/P:
- building shared VT libraries
- tracing 3rd-party libraries (e.g. libc I/O)
- tracing multi-threaded applications
VT configure fixes:
- fixed detection on CTool for 3rd-party library tracing
VT fixes:
- reduced memory overhead by using the trace buffer for string/array elements of some records
- do not shutdown call-stack if max. number of buffer flushes reached, because the additional function leaves suggest a wrong application flow
- vtunify-mpi:
- fixed conversion of VTUnify_MPI_Aint arrays
- vtwrapper:
- if an OPARI modified object file (*.mod.o) cannot be renamed, abort only if the compiler wrapper runs in "only-compile" mode (-c)
OTF fixes:
- otfinfo:
- fixed and enhanced calculation of trace file size
- changed unit of timer resolution (s -> Hz)
- otfprofile:
- fixed progress
- kill '_' and '\' in process names to make LaTex happier
This commit was SVN r22963.
1. fix a bug that caused an infinite loop in configure when specifying want-ft but not want-ft-thread by removing a stale reference to the opal-progress-thread option
2. add want-ft=orcm so we can build the orcm errmgr component
3. cleanup the use of "ompi_want_ft_xxx" and replace it with "opal_want_ft_xxx" so that naming conventions are preserved
This commit was SVN r22885.
Adds memory barriers which are definitely needed on powerpc
This commit was SVN r22879.
The following Trac tickets were found above:
Ticket 2351 --> https://svn.open-mpi.org/trac/ompi/ticket/2351
Update ompi_info and orte-info to include the new framework.
Fix some selection logic and a typo'd variable name
Still remains ompi_ignored until we complete testing
This commit was SVN r22848.
Remove the --enable-progress-threads option as this is no longer functional, and hardcode OPAL_ENABLE_PROGRESS_THREADS to 0.
Replace the --enable-mpi-threads option with --enable-mpi-thread-multiple as this is clearer as to meaning. This option automatically turns "on" opal thread support if it wasn't already so specified. If the user specifies --disable-opal-multi-threads --enable-mpi-thread-multiple, we will error out with a message
Add a new --enable-opal-multi-threads option that turns "on" opal thread support without doing anything wrt mpi-thread-multiple
This commit was SVN r22841.
After talking to both Brian and George, the conensus was to just
remove the flag and the test function. Begone, evil spirits, BEGONE!
This commit was SVN r22831.
The following Trac tickets were found above:
Ticket 2273 --> https://svn.open-mpi.org/trac/ompi/ticket/2273
Aleksej Saushev.
Dont use bash or bashism in shell scripts
We should use Posix' setpgid(0,0), which is equivalent to setpgrp().
This commit was SVN r22829.
Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those.
The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed.
This commit was SVN r22824.
- skip test for libdl on BlueGene? and CrayXT platforms (particularly on CrayXT this library can be linked but it isn't suitable)
- set cache variables for functions 'PMPI_Win_test', 'PMPI_Win_lock', 'PMPI_Win_unlock', and 'MPI_Register_datarep', if VT is configuring for Open MPI
- added test for pthread functions 'pthread_condattr_<set|get>pshared' and 'pthread_mutexattr_<set|get>pshared', because they are not available on some platforms
VT fixes:
- cut 'nm' collected symbol names at '??'
- vtunify:
- fixed unsafe usage of some strncpy's
- fixed potential segmentation fault in vtunify-mpi which might occur on 32bit platforms using MPICH2
OTF general:
- updated date in copyright header of each source file
OTF fixes:
- minor code cleanups (indentation, nicer error messages, more correct free's)
- otfaux:
- fix to place final statistics after the very last record instead of right before
- changed fatal error to a warning when a file is closed twice (or unexpectedly)
This commit was SVN r22820.
previously an orte_std_cntr_t, which is int32_t).
Comparison with < 0 don't make any sense, here.
This commit was SVN r22799.
The following SVN revision numbers were found above:
r22727 --> open-mpi/ompi@2541aa98ab
Short version: there is a bug in OS X/Snow Leopard, but there is also
a bug in Open MPI. Fixing the bug in Open MPI is both trivial (a
1-line change) and avoids the bug in OS X. We'll file an OS X bug
report upstream with Apple, but it should no longer affect us here in
OMPI.
Fixes trac:2039.
More details:
Some background first:
1. IPv4 sockets can only accept incoming IPv4 connections. However,
IPv6 sockets can be configured to accept ''only'' incoming IPv6
connection, or ''both'' incoming IPv4 and IPv6 connections. An
IPv6 socket attribute sets which listening behavior is used.
1. IPv4 and IPv6 have different port namespaces. Hence, it is
permissable to bind a v4 socket to port X ''and'' also bind a v6
socket to that same port X on the same interface (assuming that
the v6 socket is only accepting incoming v6 connections).
Incoming v4 connections to port X on the interface should get
matched to the listening v4 socket; incoming v6 connections should
get matched to the listening v6 socket.
1. When v6 sockets accept ''both'' incoming v4 and v6 connections, it
should claim port X in both namespaces.
1. Linux's default behavior is to only allow one listening socket to
be bound to a given port (i.e., ''either'' a v6 or v4 socket to be
bound to a single port X -- not both). A v6 socket can listen for
both v4 and v6 incoming connections on that port, but still --
only one socket will be bound to that port.
1. Snow Leopard's default behavior is to share ports -- i.e., let
both a v4 and a v6 listening socket to be bound to port X
(assuming that the v6 socket is only accepting incoming v6
connections).
The TCP BTL creates a listening socket for each address family.
Hence, it creates a v4 listening socket on INADDR_ANY and a v6
listening socket on the v6 equivalent of INADDR_ANY. OMPI then
iteratively tries to find ports to listen on within the range of
[mca_btl_tcp_port_min, mca_btl_tcp_port_min + mca_btl_tcp_port_range).
On Linux, the v4 socket will be bound to port X and the v6 socket will
likely be bound to port Y (where X != Y). On Snow Leopard, the v4
socket will be bound to port X and the v6 socket may ''also'' be bound
to port X. Since the namespaces are separate, this shouldn't be a
problem.
However, Open MPI was accidentally setting the v6 listening behavior
to accept ''both'' v4 and v6 incoming connections. This is a trivial
thing to fix -- change a 0 to a 1 in the code. On Linux, this issue
didn't matter because the v4 and v6 sockets were on different ports.
So even though the v6 socket ''would'' have accepted incoming v4
connections, that never happened because OMPI would direct v4
connections to the v4 port.
But on Snow Leopard, the v4 and v6 listening ports could end up
sharing the same port number. As mentioned above, this ''shouldn't''
have been a problem, but it looks like Snow Leopard has the following
bugs:
* If a v4 socket is already bound to port X, we're pretty sure that a
v6 socket with the "accept both v4 and v6 incoming connections"
listening behavior should not be able to claim port X (because
there's already a v4 socket listening on X). However, Snow Leopard
would allow binding a v4 socket to port X, and then allow a v6
socket configured to allow incoming v4 and v6 connections to
''also'' be bound to port X.
* After binding the v6 socket to port X, Snow Leopard then lets
''another'' v4 socket ''also'' get bound to port X. Hence, there's
now '''three''' sockets all listening on port X.
This obviously led to mis-matched TCP connections, and things went
downhill from there.
That being said, Snow Leopard doesn't exhibit this behavior if v6
sockets only allow incoming v6 connections. And technically, that is
exactly the behavior we want (we want v6 sockets to only accept
incoming v6 connections). So if we just change the flag to make our
v6 listening socket us this behavior, the problem on OS X goes away.
That's what this commit does -- it changes a 0 to a 1, indicating
"only let this v6 socket allow incoming v6 connections."
That was simple, wasn't it?
This commit was SVN r22788.
The following Trac tickets were found above:
Ticket 2039 --> https://svn.open-mpi.org/trac/ompi/ticket/2039
1. The code that looks at btl_tcp_if_exclude before doing a
modex_send uses strcmp rather than strncmp. That means that
"lo0" gets sent even though "lo" is excluded.
2. The code that determines whether a particular local TCP
interface can connect to a particular remote interface doesn't
check for loopback interfaces. With this fix, users can now
enable "lo" and be assured that it will only be used for intra-
node communication.
This commit was SVN r22762.
btl_openib_ip.*. The routines in these files are not specific to
iwarp -- they are specific to IP interfaces used with IBV devices
(even IB or IBoE/RoCEE/whatever devices).
This commit was SVN r22718.
issues with iwarp.c. These fixes are needed for IBoE / ROCEE /
whateveritscalledtoday. I added a few minor changes to his base
patch.
This commit was SVN r22717.
Modify the orte configure options to specify --enable-multicast such that it directs components to build or not instead of littering the code base with #if's. Remove those #if's where they used to occur.
Add a new grpcomm "mcast" module to support multicast operations. Still some work required to properly perform daemon collectives for comm_spawn operations. New module only builds when --enable-multicast is provided, and when specifically selected.
This commit was SVN r22709.
bug: libmpi_f90 had libmpi.la in its LIBADD instead of libmpi_f77.la.
Fixes trac:2244.
This commit was SVN r22704.
The following Trac tickets were found above:
Ticket 2244 --> https://svn.open-mpi.org/trac/ompi/ticket/2244
mca_osc_rdma_component.c_modules in ompi_osc_rdma_windx_to_module
Fixes case where there is unprotected access to
mca_osc_rdma_component.c_modules in ompi_osc_rdma_windx_to_module
This commit was SVN r22700.
discussed extensively. See
https://svn.open-mpi.org/trac/ompi/ticket/2092 and the RFC thread
http://www.open-mpi.org/community/lists/devel/2010/02/7447.php.
Specifically:
* Create LT convenience libraries for OPAL and ORTE if the layer
above them is being created (use the already-defined
AM_CONDITIONALs to know if the project above us is being built).
* ORTE slurps in the LT convenience library for OPAL; OMPI slurps in
the LT convenience library for ORTE.
* Wrapper compilers now only -l one library (e.g., ortecc only does
-lopen-ret, and mpicc only does -lmpi).
This commit was SVN r22691.
INTERNAL to EXTRA_RETAIN, because not all "internal" communicators
have this flag set (only internal communicators with CIDs less than
their parent). Hence, what this flag ''really'' means is that there
was an extra RETAIN performed on it. So name the flag just that --
EXTRA_RETAIN -- indicating that an extra RETAIN has occurred.
This commit was SVN r22690.
The following SVN revision numbers were found above:
r22671 --> open-mpi/ompi@61dee816db
can occur ( fixes trac:2111 ).
Should not deregister memory with the rcache lock held otherwise a deadlock can occur as the lower
level infiniband libraries can free memory ( fixes trac:2110 )
cmr:v1.4
This commit was SVN r22683.
The following Trac tickets were found above:
Ticket 2110 --> https://svn.open-mpi.org/trac/ompi/ticket/2110
Ticket 2111 --> https://svn.open-mpi.org/trac/ompi/ticket/2111
as this can result in a low level free of memory which
can require the rcache lock resulting in a deadlock
This fixes trac:2107
cmr:v1.4
This commit was SVN r22679.
The following Trac tickets were found above:
Ticket 2107 --> https://svn.open-mpi.org/trac/ompi/ticket/2107
when protecting the no_wqe_pending_frags list.
fixes trac:2118 add cmr:v1.4
This commit was SVN r22678.
The following Trac tickets were found above:
Ticket 2118 --> https://svn.open-mpi.org/trac/ompi/ticket/2118
Also includes some minor copytight header additions that were missed in previous checkins
fixes trac:2101 added cmr:v1.4
This commit was SVN r22676.
The following Trac tickets were found above:
Ticket 2101 --> https://svn.open-mpi.org/trac/ompi/ticket/2101
communicator that we created has a lower CID than the parent comm. This can
happen when using the hierarch collective communication module or for
inter-communicators (since we make a duplicate of the original communicator).
This is not a problem as long as the user calls MPI_Comm_free on the parent
communicator. However, if the communicators are not freed by the user but
released by Open MPI in MPI_Finalize, we walk through the list of still
available communicators and free them one by one. Thus, local_comm is freed
before the actual inter-communicator. However, the local_comm pointer in the
inter communicator will still contain the 'previous' address of the local_comm
and thus this will lead to a segmentation violation. In order to prevent that
from happening, we increase the reference counter local_comm by one if its CID
is lower than the parent. We cannot increase however its reference counter if
the CID of local_comm is larger than the CID of the inter communicators, since
a regular MPI_Comm_free would leave in that the case the local_comm hanging
around and thus we would not recycle CID's properly, which was the reason and
the cause for this trouble.
This commit fixes tickets 2094 and 2166. Note however, that I want to close
them manually, since a slightly different patch is required for the 1.4
series. This commit will have to be applied for the 1.5 series. And I will
need a volunteer to review it.
This commit was SVN r22671.
parameter (I just discovered while researching for v1.4 that v1.4 has
effectively this same function definition: it just always returns
true!).
This commit was SVN r22642.
This commit adds a lengthy comment in ompi_datatype.h that explains
why a one-sided datatype check was removed. The short version is that
we do have to allow some datatypes that may be unwise to use (e.g.,
"h" types of datatypes that have offsets in bytes -- MPI says it's ok
to use these), and our DDT engine can't currently detect datatypes
with absolute offsets, which MPI says it's ''not'' ok to use with
one-sided operations. Hence, we don't check for some datatypes that
are invalid to use with one-sided operations, and erroneous programs
may crash and burn. Life is hard.
The main point of this commit is that we now do allow datatypes for
one-sided operations that are supposed to be allowed.
This commit was SVN r22641.
The following Trac tickets were found above:
Ticket 2233 --> https://svn.open-mpi.org/trac/ompi/ticket/2233
other process should ignore this value. Thanks to Michael Hofmann
for investigating this issue.
This commit closes trac:2268.
This commit was SVN r22639.
The following Trac tickets were found above:
Ticket 2268 --> https://svn.open-mpi.org/trac/ompi/ticket/2268
- Updated date in copyright header of each source file
VT configure fixes:
- fixed configure's version detection for PAPI to support version 4.x
- added configure tests to detect Bull MPICH2
VT new features:
- added support for "re-locate" an existing VampirTrace? installation without re-build it from source (fixes OMPI's ticket #1990)
- added support for tracing functions in shared libraries instrumented by the GNU, Intel, Pathscale, ot PGI 9 compiler
- added support for PAPI-C counters which belong to different components
- extended usability of environment variable VT_METRICS for PAPI counters to specifiy whether a counter provides increasing or absolute values
This commit was SVN r22637.
If file does not exist, check the directory it lives in...
Maybe used by caller, trying to open mmap() on NFS, Lustre or
Panasas (thanks Sam).
For now, this is used to warn about the usage of mmap on such FS.
Please note, that Ralph mentioned the orte_no_session_dir parameter.
The help message includes a reference to this.
Tested on NFS and Lustre on Linux on
smoky: mpirun --mca orte_tmpdir_base $HOME/tmp -np 2 ./mpi_stub
jaguar: mpirun ... --mca orte_tmpdir_base /tmp/work/$USER ...
Fixes trac:1354
This should cmr:v1.5 once it has soaked and is shown to work on
Solaris
This commit was SVN r22604.
The following Trac tickets were found above:
Ticket 1354 --> https://svn.open-mpi.org/trac/ompi/ticket/1354
long-standing bugs (see trac ticket list below). They're currently
somewhat obscure bugs, but are becoming much more relevant in a world
where OpenFabrics devices fail and you replace them with a newer model
(i.e., the cluster is homogeneous... ''except'' for where you had to
replace one or two OpenFabrics devices, and the same model is no
longer available).
This commit includes a '''lengthy''' comment (that we spent a lot of
time writing!) about what exactly it does and does not do. The
previous code was rather short and '''incredibly''' subtle. The new
code is slightly longer, but is both much more explicit and much more
painstakingly documented.
This commit fixes multiple trac tickets. The real one that we fix is
#1707; the others are fixed as a side-effect. In short: fixing #1707
prevents Bad Things from happening later in the startup sequence.
Fixes trac:1707, #2164, #1574.
cmr:v1.4.2:reviewer=pasha
cmr:v1.5:reviewer=pasha
This commit was SVN r22592.
The following Trac tickets were found above:
Ticket 1707 --> https://svn.open-mpi.org/trac/ompi/ticket/1707
Add a ''map_bynode'' info key to determine if the job to be started by comm_spawn* should be mapped by node or by slot. Default is to map according to the default policy set when the parent job was started.
cmr:v1.5.1
This commit was SVN r22564.
* Don't build the pstat component if all defines needed aren't there.
* Update platform file to work better
* Work around two places that depended on modex being operational
This commit was SVN r22536.
after the compiler argv tokens.
Not closing #2201 yet; there's still discussion on that ticket about
whether we want to do more or not.
Refs trac:2201
cmr:v1.4.2
cmr:v1.5
This commit was SVN r22513.
The following Trac tickets were found above:
Ticket 2201 --> https://svn.open-mpi.org/trac/ompi/ticket/2201
- patch libtool.m4 to fix detection of PGI 10 C++ compiler
- patch configure to fix detection of PGI 10 Fortran compiler (pgfortran)
- checks for MPI:
- do only check Fortran interoperability if F77 compiler given
- do not enable MPI-2 I/O support for LAM/MPI
- added configure checks for PMPI_Win_<lock|unlock|test>, because in LAM-MPI these functions are missing
- checks for LIBC-I/O tracing:
- pass ldd's stderr output to /dev/null
VT source fixes:
- fixed detection of unique node id on MacOS platforms (use sysctl instead of gethostid)
- fixed yet another Coverity warning
- fixed compiler warnings on MacOS
This commit was SVN r22508.
- set configure variable 'inside_openmpi' to "no", if hidden argument '--with-openmpi-inside' not given
- added functions 'MPI_Group_range_<incl|excl>' to Fortran MPI wrappers
- updated default configure options for NECSX, BlueGene/L+P
- repaired tools/opari/doc/lacsi01.pdf
- fixed several Coverity warnings
This commit was SVN r22476.
- added support for shared libraries inside Open MPI
- hidden configure option '--with-openmpi-inside'
- do not show config titles/summary if configuring inside Open MPI
This commit was SVN r22440.
In CMake 2.6 and earlier, this function add dependencies for targets and also link the target libraries automatically, but in CMake 2.8,this behavior has been changed, i.e. it will only add the dependencies but no link, which will cause linking errors at compilation time.
This commit was SVN r22405.
Special-case the before MPI_INIT / after MPI_FINALIZE error messages
so that they can be a bit more clear than the general "an error
occurred" messages that are displayed in the middle of MPI jobs.
This is not really a "bug fix", but it is helpful for usability. I
leave it up to the v1.4 RM's to decide if they want it for the 1.4
series or not.
This commit was SVN r22382.
The following Trac tickets were found above:
Ticket 2158 --> https://svn.open-mpi.org/trac/ompi/ticket/2158
than can be used (e.g., number of on-node peers), that no additional
room is set aside for those FIFOs that will never be created. This
makes it easier to have dedicated FIFOs: just set btl_sm_num_fifos
to be very large rather than setting it to be the local number of
procs. In practice, we ask for extra headroom anyhow, so this change
generally won't matter.
This commit was SVN r22291.
- removed tools/opari/doc/lacsi01.ps.gz which is equivalent to tools/opari/doc/lacsi01.pdf
- corrected svn:mime-type of tools/opari/doc/opari-logo-100.gif
This commit was SVN r22267.
friends also receive &argc and &argv (George asked Jeff to Ralph to
review before committing). The thought is that passing argv and argc
to opal/orte_init be useful to other projects outside of OMPI that are
using OPAL and/or ORTE (especially in conjunction with some other
bootstrapping code where it is helpful to modify argv). It's such a
small thing that it's easy to apply here to make others' lives a
little easier.
Ask George for more details; I'm just the messenger. :-)
Judging by the copyrights on this patch, it's been around for a
while. :-)
This commit was SVN r22260.
we should have also relaxed the error checking for MPI_GRAPH_CREATE.
Thanks to David Singleton for pointing this out.
This commit was SVN r22251.
The following SVN revision numbers were found above:
r21816 --> open-mpi/ompi@b8332ea2b2
other request-using frameworks.
- Rather than having mpi/c/* functions allocate requests explicitly,
pass the MPI_Request* down to the I/O component and have it
perform the allocation.
- While the I/O base provides a base request which can be used,
it is not required and all request management occurs within
the component.
- Push progress management into the component, rather than having it
happen in the base. Progress functions are now easily registered,
and not all (ie, the one existing) components use progress functions
in any rational way.
ROMIO switched to generalized requests instead of MPIO_Requests many
moons ago, and Open MPI now uses ROMIO's generalized requests, so there
is no reason to wrap those requests (which are OMPI requests) in another
level of request.
Now the file function passes the MPI_Request* to the ROMIO component,
which passes it to the underlying ROMIO function, which calls
MPI_Grequest_start to create an OMPI request, which is what gets set
as the request to the user. Much cleaner.
This patch has two motivations. One, a whole heck of a lot of code
just got removed, and request handling is now much cleaner for I/O
components. Two, by adding support for Argonne's proposed generalized
request extensions, we can allow ROMIO to provide async I/O through
generalized requests, which we couldn't rationally do in the old
setup due to the crazy request completion rules.
This commit was SVN r22235.
use the new Automake "silent rules" if available.
If you are using an Automake prior to v1.11, you won't see the new
silent rules -- it will automatically default back to the "verbose"
rules.
Note, too, that even with these changes, you can enable the verbose
"make all" output in one of two ways:
1. Add "V=1" to your "make" command line
{{{
shell$ make all V=1
}}}
2. Add "--disable-silent-rules" to your "configure" command line:
{{{
shell$ ./configure --disable-silent-rules ...
}}}
The one down side of using the silent rules by default is that we'll
get less diagnostic information when users send their build logs. I
think we should update the web page to request that users send build
logs of "make V=1", but I'm guessing that not everyone will do it.
Note that I did ''not'' silent-ize the libltdl build (which is a dozen
or so files in the beginning of the build) because we wholly import
libltdl at autogen time. I therefore didn't want to patch libltdl
(further) after importing it a) to remain as forward- compatible as
possible, and b) patching the imported libltdl build system might be
tricky in terms of timestamps / dependencies. So those dozen-or-so
files will still be "verbose", but the rest of the files in OMPI will
be "silent".
This commit was SVN r22189.
area, we cap the size at LONG_MAX. But we are figuring out how much
we need. So, if that amount exceeds LONG_MAX, we should return an
"out of resource" error code.
This commit was SVN r22172.
Continue the reorganization of the configure system. Move files from the main config directory to their appropriate level-specific config directories. Modify the configure system to correctly handle compiler detection, test, and setup so that all things pertaining to opal and orte are done at the lower level, with the ompi configure system only looking at mpi-specific options.
Ensure the wrapper compilers for orte and ompi only get built when appropriate. Add support for c++ to the orte wrapper compilers, both script and non-script versions.
This commit was SVN r22138.
therefore the m4 test really belongs on orte/config. Thank Terry!
Additionally, I took the opprotunity to rename the variable so that
"TOTALVIEW" is not in the name anymore (because it applies to all
variables, not just Totalview).
This commit was SVN r22134.
as simple as I or Ralph had hoped. This should be the real fix,
or very close to it. I can now see both the sensor and rmcast
information from ompi_info when configured
with --enable-monitoring --enable_multicast
This commit was SVN r22131.
The following SVN revision numbers were found above:
r22129 --> open-mpi/ompi@02ff00dfb5
XML code in the F90 tree isn't used anymore, but we might as well
update it just so that everything is consistent).
This commit was SVN r22127.
The following Trac tickets were found above:
Ticket 2067 --> https://svn.open-mpi.org/trac/ompi/ticket/2067
from "MPI_*_errhandler_fn" to "MPI_*_errhandler_function" (and their
corresponding C++ types, too). Also updated the corresponding man
pages, and marked the typedefs to the now-deprecated types as
deprecated.
This commit was SVN r22122.
The following Trac tickets were found above:
Ticket 2060 --> https://svn.open-mpi.org/trac/ompi/ticket/2060
Re-enable "./autogen.sh -no-ompi" again. If you -no-ompi, the entire OMPI
configury is skipped and the entire ompi/ subtree is not built. There's
some simple m4-isms that prune out the relevant parts.
I added ompi/config/, orte/config/, and opal/config/ directories. I moved a
bunch of m4 files from the top-level config/ dir into ompi/config/, and a few
into orte/config/.
Note that all 3 <project>/config directories have a config_files.m4 file. This
file contains the AC_CONFIG_FILES list for that project. The AC_CONFIG_FILES
call cannot be in an AC_DEFUN macro and conditionally called -- if it is
included at all, Autoconf will process it. Hence, these config_files.m4 files
don't AC_DEFUN -- they just have AC_CONFIG_FILES. m4_ifdef() is used to
conditionally include the files or not.
I moved a bunch of obvious OMPI-only m4 files from config/ to ompi/config/,
but I'm sure that there's more that could go. A ticket will be filed with
thoughts on future work in this area.
This commit was SVN r22113.
This commit does a bunch of things:
* Address all remaining code review items from CMR #2023:
* Defer mmap setup to be lazy; only set it up the first time we
invoke a collective. In this way, we don't penalize apps that
make lots of communicators but don't invoke collectives on them
(per #2027).
* Remove the extra assignments of mca_coll_sm_one (fixing a
convertor count setup that was the real problem).
* Remove another extra/unnecessary assignment.
* Increase libevent polling frequency when using the RML to
bootstrap mmap'ed memory.
* Fix a minor procs-related memory leak in btl_sm.
* Commit a datatype fix that George and I discovered along the way to
fixing the coll sm.
* Improve error messages when mmap fails, potentially trying to
de-alloc any allocated memory when that happens.
* Fix a previously-unnoticed confusion between extent and true_extent
in coll sm reduce.
This commit was SVN r22049.
The following Trac tickets were found above:
Ticket 2023 --> https://svn.open-mpi.org/trac/ompi/ticket/2023
(yay cisco webex!). Make sure we only go up to OPAL max datatype, not
OMPI max datatype.
This commit was SVN r22016.
The following Trac tickets were found above:
Ticket 2028 --> https://svn.open-mpi.org/trac/ompi/ticket/2028
shmem progress (or the Windows equiv). Instead, poll hard on the
condition, but periocially call opal_progress(). This allows
badly-formed apps (e.g., the ibm test communicator/bsend_free) to
actually complete.
To be clear, there are far too many apps out there that assume that
MPI collectives will actually progress the rest of MPI. I don't like
putting in a feature to enable broken apps, but I have a dim
recollection of this issue coming up before (apps "hanging" when
testing the sm coll because they assumed that calling collectives
would trigger other MPI progress). Rather than have people claim that
OMPI is broken, I prefer to put in this "workaround". :-(
Indeed, the bsend_free test ''may'' be coded that way for exactly that
reason...? I don't remember offhand...
This commit was SVN r21984.
This commit fixes the ft_event logic so that it uses the normal destroy funcitonality instead of the workaround with the component that was previously there. All and all it made for cleaner code, which is always good.
If r21967 moves to v1.3, this patch will need to be moved as well.
This commit was SVN r21972.
The following SVN revision numbers were found above:
r21967 --> open-mpi/ompi@533633b8cb
Before this, we would restore the topmost old session directory. This commit makes sure that we remove it when we are done with it.
This commit was SVN r21971.
in the v1.2 series the cid's could never go above the max. allowed for a
particular pml. Because of that, pml_add_comm never checked for the cid, and
in fact pml_add_comm was called in comm_set, which is *before* we knew the
cid.
in the v1.3 series (and trunk) we check now the cid to detect overflow, and
because of that pml_add_comm has been moved *after* the cid allocation
routine, namely into the comm_activate routine.
in the v1.2 series, the comm_activate contained a synchronization step of the
old communicator in order to prevent incoming fragments on the new
communicator, with the main problem being that the allreduce in the
communicator allocation finished at different times on different processes,
and thus, this scenario could and did really occur.
in the v1.3 series, the comm_activate does not contain the synchronization
step anymore, since we introduced the new queue for fragments with unknown
cid. The problem is however, that whether a fragment is known or not is
decided by using ompi_comm_lookup(), which will return something useful as
soon as the cid allocation finished, even before pml_add_comm has been
called. So there is a small time gap where we will not post a message into
queue for unknown cid's, but we can also not look up the process structure
belonging to the rank in that comm ( that is in pml_ob1_match_recv_frag or
something like that).
The current fix reintroduces the synchronization step in comm_activate, and
ensures that no fragment can be received for a new communicator before the
synchronization occurs , and thus comm_nextcid() and pml_add_comm has been
called. It seems to be the safest and easiest way for now. Welcome back, v1.2.
This commit was SVN r21970.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
As noted in http://www.open-mpi.org/community/lists/devel/2009/08/6741.php,
we do not correctly free a dupped predefined datatype.
The fix is a bit more involving. See ticket for details.
Tested with ibm tests and mpi_test_suite (though there's two "old" failures
zero5.c and zero6.c)
Thanks to Lisandro Dalcin for bringing this up.
This commit was SVN r21929.
The following Trac tickets were found above:
Ticket 2014 --> https://svn.open-mpi.org/trac/ompi/ticket/2014
#if defined (c_plusplus)
defined (__cplusplus)
followed by
extern "C" {
and the closing counterpart by BEGIN_C_DECLS and END_C_DECLS.
Notable exceptions are:
- opal/include/opal_config_bottom.h:
This is our generated code, that itself defines BEGIN_C_DECL and
END_C_DECL
- ompi/mpi/cxx/mpicxx.h:
Here we do not include opal_config_bottom.h:
- Belongs to external code:
opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.c
opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.h
- opal/include/opal/prefetch.h:
Has C++ specific macros that are protected:
- Had #if ... } #endif _and_ END_C_DECLS (aka end up with 2x
END_C_DECLS)
ompi/mca/btl/openib/btl_openib.h
- opal/event/event.h has #ifdef __cplusplus as BEGIN_C_DECLS...
- opal/win32/ompi_process.h: had extern "C"\n {...
opal/win32/ompi_process.h: dito
- ompi/mca/btl/pcie/btl_pcie_lex.l: needed to add *_C_DECLS
ompi/mpi/f90/test/align_c.c: dito
- ompi/debuggers/msgq_interface.h: used #ifdef __cplusplus
- ompi/mpi/f90/xml/common-C.xsl: Amend
Tested on linux using --with-openib and --with-mx
The following do not contain either opal_config.h, orte_config.h or
ompi_config.h
(but possibly other header files, that include one of the above):
ompi/mca/bml/r2/bml_r2_ft.h
ompi/mca/btl/gm/btl_gm_endpoint.h
ompi/mca/btl/gm/btl_gm_proc.h
ompi/mca/btl/mx/btl_mx_endpoint.h
ompi/mca/btl/ofud/btl_ofud_endpoint.h
ompi/mca/btl/ofud/btl_ofud_frag.h
ompi/mca/btl/ofud/btl_ofud_proc.h
ompi/mca/btl/openib/btl_openib_mca.h
ompi/mca/btl/portals/btl_portals_endpoint.h
ompi/mca/btl/portals/btl_portals_frag.h
ompi/mca/btl/sctp/btl_sctp_endpoint.h
ompi/mca/btl/sctp/btl_sctp_proc.h
ompi/mca/btl/tcp/btl_tcp_endpoint.h
ompi/mca/btl/tcp/btl_tcp_ft.h
ompi/mca/btl/tcp/btl_tcp_proc.h
ompi/mca/btl/template/btl_template_endpoint.h
ompi/mca/btl/template/btl_template_proc.h
ompi/mca/btl/udapl/btl_udapl_eager_rdma.h
ompi/mca/btl/udapl/btl_udapl_endpoint.h
ompi/mca/btl/udapl/btl_udapl_mca.h
ompi/mca/btl/udapl/btl_udapl_proc.h
ompi/mca/mtl/mx/mtl_mx_endpoint.h
ompi/mca/mtl/mx/mtl_mx.h
ompi/mca/mtl/psm/mtl_psm_endpoint.h
ompi/mca/mtl/psm/mtl_psm.h
ompi/mca/pml/cm/pml_cm_component.h
ompi/mca/pml/csum/pml_csum_comm.h
ompi/mca/pml/dr/pml_dr_comm.h
ompi/mca/pml/dr/pml_dr_component.h
ompi/mca/pml/dr/pml_dr_endpoint.h
ompi/mca/pml/dr/pml_dr_recvfrag.h
ompi/mca/pml/example/pml_example.h
ompi/mca/pml/ob1/pml_ob1_comm.h
ompi/mca/pml/ob1/pml_ob1_component.h
ompi/mca/pml/ob1/pml_ob1_endpoint.h
ompi/mca/pml/ob1/pml_ob1_rdmafrag.h
ompi/mca/pml/ob1/pml_ob1_recvfrag.h
ompi/mca/pml/v/pml_v_output.h
opal/include/opal/prefetch.h
opal/mca/timer/aix/timer_aix.h
opal/util/qsort.h
test/support/components.h
This commit was SVN r21855.
The following SVN revision numbers were found above:
r2 --> open-mpi/ompi@58fdc18855