OMPI
and a language agnostic part in OPAL. The convertor is completely
moved into OPAL. This offers several benefits as described in RFC
http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
namely:
- Fewer basic types (int* and float* types, boolean and wchar
- Fixing naming scheme to ompi-nomenclature.
- Usability outside of the ompi-layer.
- Due to the fixed nature of simple opal types, their information is
completely
known at compile time and therefore constified
- With fewer datatypes (22), the actual sizes of bit-field types may be
reduced
from 64 to 32 bits, allowing reorganizing the opal_datatype
structure, eliminating holes and keeping data required in convertor
(upon send/recv) in one cacheline...
This has implications to the convertor-datastructure and other parts
of the code.
- Several performance tests have been run, the netpipe latency does not
change with
this patch on Linux/x86-64 on the smoky cluster.
- Extensive tests have been done to verify correctness (no new
regressions) using:
1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
ompi-ddt:
a. running both trunk and ompi-ddt resulted in no differences
(except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
correctly).
b. with --enable-memchecker and running under valgrind (one buglet
when run with static found in test-suite, commited)
2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
all passed (except for the dynamic/ tests failed!! as trunk/MTT)
3. compilation and usage of HDF5 tests on Jaguar using PGI and
PathScale compilers.
4. compilation and usage on Scicortex.
- Please note, that for the heterogeneous case, (-m32 compiled
binaries/ompi), neither
ompi-trunk, nor ompi-ddt branch would successfully launch.
This commit was SVN r21641.
point, the event engine has been shut down until btl finalization is
done, so opal_progress in the wait loop is not an option - we have
to drain from inside the btl.
Clean up the looping structure for the finalize routine
Update copyrights.
This commit was SVN r21620.
different processes have requested different levels of thread support. This
verification is restricted to MPI_COMM_WORLD.
In case one ore more processes have requested support for MPI_THREAD_MULTIPLE,
the cid selection algorithm will fall back to the original, thread safe
approach. Else, it uses the block-algorithm.
For dynamic communicators, we always fall back now to the original algorithm.
This has been tested for homogeneous and heterogeneous settings for
MCW. However, I could not test yet the dynamic comm scenario for technical
reasons, and that's why I don't close yet ticket 1949.
This commit was SVN r21613.
IPv4 and IPv6) is outside the legal boundaries. This fixes trac:1869.
This commit was SVN r21612.
The following Trac tickets were found above:
Ticket 1869 --> https://svn.open-mpi.org/trac/ompi/ticket/1869
(also usable to check validation of a trace)
- removed OTF python bindings
(a working version of the OTF python bindings are available in the latest stand-alone release;
see http://www.tu-dresden.de/zih/otf/)
- incremented OTF version number
This commit was SVN r21601.
btl_sm.c: In function ‘mca_btl_sm_sendi’:
btl_sm.c:734: warning: comparison between signed and unsigned
btl_sm.c: In function ‘mca_btl_sm_send’:
btl_sm.c:812: warning: comparison between signed and unsigned
This commit was SVN r21552.
The following SVN revision numbers were found above:
r21551 --> open-mpi/ompi@bd995d26b4
- poll FIFO occasionally even if just sending messages
- retry pending sends more often
- just before trying a new send
- as part of mca_btl_sm_component_progress
Maintain two new mca_btl_sm_component variables, num_outstanding_frags
and num_pending_sends, to keep overhead low.
Drain only one message fragment from the FIFO per btl_sm_component_progress
call (rather than drain until empty, which in retrospect everyone considers
to have been a mistake).
This commit was SVN r21551.
This commit was SVN r21533.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r21524
(now works on both big and little endian machines)
* Be a little more flexible when looking for active devices in
btl_openib_component.c
* Add device name and port number to lots of verbose and help
messages
* Add a bunch of verbose messages to give insight into what is
occurring during all the CPC wireups
This commit was SVN r21418.
the debugger plugins into files suffixed by _dbg.h.
This commit was SVN r21404.
The following Trac tickets were found above:
Ticket 1931 --> https://svn.open-mpi.org/trac/ompi/ticket/1931
Revamp the affinity detection/set procedure in mpi_init to correctly detect when we have already been bound to processors, given the revised understanding of paffinity_get. Add a new paffinity macro to make checking for already bound a little nicer.
This commit was SVN r21402.
Yes, friends, our favorite PCIE BTL has resurfaced as mgmt vacillates over its existence. This is an updated version that actually mostly works, in its final stages of debugging.
Some generalization still remains to be done...
This commit was SVN r21358.
not end up in OPAL
- Will post an updated patch for the OMPI_ALIGNMENT_ parts (within C).
This commit was SVN r21342.
The following SVN revision numbers were found above:
r21330 --> open-mpi/ompi@95596d1814
MTT caught builds failing with the following error which flagged the problem:
{{{
param.cc:802: error: ‘MPI_MAX_DATAREP_STRING’ was not declared in this scope
}}}
This commit was SVN r21337.
The following SVN revision numbers were found above:
r21308 --> open-mpi/ompi@2f9765926e
into the OPAL namespace, eliminating cases like opal/util/arch.c
testing for ompi_fortran_logical_t.
As this is processor- and compiler-related information
(e.g. does the compiler/architecture support REAL*16)
this should have been on the OPAL layer.
- Unifies f77 code using MPI_Flogical instead of opal_fortran_logical_t
- Tested locally (Linux/x86-64) with mpich and intel testsuite
but would like to get this week-ends MTT output
- PLEASE NOTE: configure-internal macro-names and
ompi_cv_ variables have not been changed, so that
external platform (not in contrib/) files still work.
This commit was SVN r21330.
well..)
- As Jeff suggested, for m4 macros, dont use _ OPAL, but
rather OPAL_ prefix
- Set the variable before AC_SUBST, so that replacement happens
in f77 header-file, too.
This commit was SVN r21316.
happens when hierarch is used. . Two major items:
- modify the comm_activate step to take an additional argument, indicating
whether the new communicatio has to go through the collective selection
step. This is not required sometimes (e.g. when a process calls
MPI_COMM_SPLIT with color=MPI_UNDEFINED), and contributed significantly to
the exhaustion of cids.
- when freeing a communicator, check whether we can reuse the block of cids
assigned to that comm. This only works if the current front of the cid
assignment (cid_block_start) is right ater the block of cids assigned to this
comm.
Fixes trac:1904
Fixes trac:1926
This commit was SVN r21296.
The following Trac tickets were found above:
Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904
Ticket 1926 --> https://svn.open-mpi.org/trac/ompi/ticket/1926
MPI_MAX_PROCESSOR_NAME
MPI_MAX_ERROR_STRING
MPI_MAX_OBJECT_NAME
MPI_MAX_INFO_KEY
MPI_MAX_INFO_VAL
MPI_MAX_PORT_NAME
MPI_MAX_DATAREP_STRING
Defaults stay as theyr currently are -- and now give an explanation on the
min/max values being used in a central place...
m4-macro _OPAL_WITH_OPTION_MIN_MAX_VALUE may be benefical in other parts
of the configure system.
- We need some of these in the lower level OPAL for an upcoming commit!
All other levels base their values on them.
This commit was SVN r21292.
functionality (per MPI-2.1). This warning can be toggled using
--enable-mpi-interface-warning (default OFF), but can be
selectively turned on passing
mpicc -DOMPI_WANT_MPI_INTERFACE_WARNING
Using icc, gcc < 4.5, warnings (such as in mpi2basic_tests) show:
type_vector.c:83: warning: ‘MPI_Type_hvector’ is deprecated
(declared at /home/../usr/include/mpi.h:1379)
Using gcc-4.5 (gcc-svn) these show up as:
type_vector.c:83: warning: ‘MPI_Type_hvector’ is deprecated
(declared at /home/../usr/include/mpi.h:1379):
MPI_Type_hvector is superseded by MPI_Type_create_hvector in MPI-2.0
Jeff and I propose to turn such warnings on with Open MPI-1.7 by default.
- Detection of user-level compiler is handled using the preprocessor
checks of GASnet's other/portable_platform.h (thanks to Paul Hargrove
and Dan Bonachea) adapted into ompi/include/mpi_portable_platform.h
(see comments).
The OMPI-build time detection is output (Familyname and Version)
with ompi_info.
This functionality (actually any upcoming __attribute__) are turned
off, if a different compiler (and version) is being detected.
- Note, that any warnings regarding (user-compiler!=build-compiler)
as discussed in the RFC are _not_ included for now.
- Tested on Linux with --enable-mpi-interface-warning on
Linux, gcc-4.5 (deprecated w/ specific msg)
Linux, gcc-4.3 (deprecated w/o specific msg)
Linux, pathscale 3.1 (deprecated w/o specific msg)
Linux, icc-11.0 (deprecated w/o specific msg)
Linux, PGI-8.0.6 accepts __deprecated__ but does not issue a warning,
further investigation needed...
This commit was SVN r21262.
shm_fifos values are only partially updated, and this leads to wrong values
for the offset. Moving the write barrier at the right place, plus forcing
some read barriers might help.
In addition I get rid of the sm_offset array which is completely useless.
This commit was SVN r21253.
for printing size_t use "%lu" and cast to (unsigned long).
This commit was SVN r21238.
The following SVN revision numbers were found above:
r21234 --> open-mpi/ompi@22b6177fb9
I just found that we have 2 place where we call for XRC domain
creation. First one in init_one_device() and second one prepare_device_for_use().
They have absolutely identical code, but the call in init_one_device() is useless
because on this stage we don't know about QP configuration and we don't know if we need
XRC at all. So I removing the duplicated code from init_one_device().
This commit was SVN r21235.
1. replacing mpi_paffinity_alone with opal_paffinity_alone - for back-compatibility, I have aliased mpi_paffinity_alone to the new param name. This caus
es a mild abstraction break in the opal/mca/paffinity framework - per the devel discussion...live with it. :-) I also moved the ompi_xxx global variable
that tracked maffinity setup so it could be properly closed in MPI_Finalize to the opal/mca/maffinity framework to avoid an abstraction break.
2. Added code to the odls/default module to perform paffinity binding and maffinity init between process fork and exec. This has been tested on IU's odi
n cluster and works for both MPI and non-MPI apps.
3. Revise MPI_Init to detect if affinity has already been set, and to attempt to set it if not already done. I have *not* tested this as I haven't yet f
igured out a way to do so - I couldn't get slurm to perform cpu bindings, even though it supposedly does do so.
This has only been lightly tested and would definitely benefit from a wider range of evaluation...
This commit was SVN r21209.
for us already.
* Slightly clarify the error message strings; now they match the new
error strings for btl_openib_ipaddr_in|exclude
This commit was SVN r21197.
case the first process of the group was not represented at all in the second
group. Also added some cleanup of the code w.r.t. booleans vs. ints.
Thanks for Geoffrey Irving for reporting the bug and providing the initial
solution.
This commit was SVN r21192.
subnet specifications (in addition to interface names). These
parameters now take a comma-delimited list of interfaces names and/or
a.b.c.d/x specifications (only IPv4 currently supported for subnet
specifications). For example:
mpirun --mca btl_tcp_if_include 10.10.30.0/8,eth0
This commit was SVN r21189.
- due to the <= with we could overrun the array
- we didn't correctly test at _all_, since we never marked the
ranks already excluded / included...
- when returning in error, we should free (elements_int_list)...
This commit was SVN r21186.
OMPI_* to OPAL_*. This allows opal layer to be used more independent
from the whole of ompi.
NOTE: 9 "svn mv" operations immediately follow this commit.
This commit was SVN r21180.
In _correct_ programs only when (group->grp_proc_count - n) > 0,
we may fill ranks_included (callers of ompi_group_excl make sure)...
Therefore move the ranks_included loop into the true
block of the if (which is changed from "!= 0" to ">0").
Otherwise, the initilization of k=0 and ranks_included=NULL is good
for the ompi_group_incl (and submethods ompi_group_*).
Tested on Linux w/ mpi_test_suite and MPIch testsuite:
4 grouptest_coll
4 groupcreate
4 grouptest
This commit was SVN r21172.
malloc buffer for ompi_info_get one character larger for the NUL-termination
See comment in ompi/mpi/c/info_get.c or MPI-2.1 p289
This commit was SVN r21154.
Well, well, just do not "call" ompi_comm_rank twice but rather
reuse variable...
- Fix Coverity CID 1262:
Using uninitialized value "(statuses[err_index]).MPI_ERROR"
Sure, these statuses are only initialized after ompi_request_wait_all,
so introduce a short-circuit label to jump to...
This commit was SVN r21153.
communicator. This works, if all processes agree that all communicators
utilizing the cids in the block have been freed. If they don't, they assign a
new block of cid's.
This fixes the application scenario reported in the week, in fact the test
succefully creates 100,000 communicators without exceeding a cid of 20. The
fix also keeps the main property of the algorithm (namely using a single
Allreduce operation to get a new block) and did not modify the communicator
structure.
This commit was SVN r21142.
to happen
* Properly error out (rather than cause buffer overflow) in case where
the datatype packed description is larger than our control fragments.
This still isn't standards conforming, but at least we know what
happened.
* Expose win_set_name to external libraries (like the osc modules)
* Set default window name to the CID of the communcator it's using
for communication
Refs trac:1905
This commit was SVN r21134.
The following Trac tickets were found above:
Ticket 1905 --> https://svn.open-mpi.org/trac/ompi/ticket/1905
Nothing notable, except mtl_base_datatype.h -- Undo change from r21096:
Yes, we should not include datatype_internal.h, but we did and we have to:
we derefence desc, and get an incomplete type, otherwise.
This commit was SVN r21103.
The following SVN revision numbers were found above:
r21096 --> open-mpi/ompi@221fb9dbca
- Delete unnecessary header files using
contrib/check_unnecessary_headers.sh after applying
patches, that include headers, being "lost" due to
inclusion in one of the now deleted headers...
In total 817 files are touched.
In ompi/mpi/c/ header files are moved up into the actual c-file,
where necessary (these are the only additional #include),
otherwise it is only deletions of #include (apart from the above
additions required due to notifier...)
- To get different MCAs (OpenIB, TM, ALPS), an earlier version was
successfully compiled (yesterday) on:
Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled
This commit was SVN r21096.
several header files (previously included by header-files)
now have to be moved "upward".
This is mainly system headers such as string.h, stdio.h and for
networking, but also some orte headers.
This commit was SVN r21095.
This fixes a bug that can happen when checkpointing while one process is in such a routine. Previously a warning was thrown.
This commit was SVN r21080.
only show up if a notifier component is selected, of course). These
can be disabled by setting the MCA parameter mpi_notify_init_finalize
to 0.
These messages are both intended as "hey, does the community like
this?" and as a way to get some real-world testing of the notify
system. The default is currently to send these messages if a notify
component is selected; we can change the default later if desired.
This commit was SVN r21078.
and mo'bettah. Put in lengthy comments explaining what's going on.
We might still want to tweak this some more, but we can no longer get
IMB-EXT to hang with this new code anymore (e.g., even without eager
RDMA -- we discovered after the fact that the code in the v1.3.2
release will hang if eager RDMA is disabled).
Fixes trac:1890. Really.
This commit was SVN r21061.
The following Trac tickets were found above:
Ticket 1890 --> https://svn.open-mpi.org/trac/ompi/ticket/1890
_endpoint_post_send(), which could result in an infinite loop (see the
comment in the code).
This is part one of a proper fix; it's suitable for the v1.3 tree and
for an immediate release. Pasha and I plan to spend a little more
time and clean up this stuff properly, but it does not need to be
included in v1.3.2.
This commit was SVN r21047.
The following Trac tickets were found above:
Ticket 1890 --> https://svn.open-mpi.org/trac/ompi/ticket/1890
were looking for. This makes the openib btl fail a little more
gracefully (for example) if you specify a bogus value to
btl_openib_mpool.
Thanks to Roberto Ammendola for identifying the exact issue.
This commit was SVN r21044.
This has turned into an MPI spec interpretation issue. :-(
Open MPI has intrepreted the spec one way for the past several years;
these commits reflect a different interpretation that changes how we
treat the EXTRA_STATE parameter to the Fortran attribute copy and
delete callbacks. This new way breaks our internal copy of the Intel
Fortran attribute tests. So after talking with Terry/Sun, we're going
to back out these changes (both here and on the v1.3 branch) until we
get further clarification from the Forum.
This commit was SVN r21028.
The following SVN revision numbers were found above:
r20926 --> open-mpi/ompi@0a24eadaad
r20941 --> open-mpi/ompi@045b0e8871
r20950 --> open-mpi/ompi@73af921c22