* New "op" MPI layer framework
* Addition of the MPI_REDUCE_LOCAL proposed function (for MPI-2.2)
= Op framework =
Add new "op" framework in the ompi layer. This framework replaces the
hard-coded MPI_Op back-end functions for (MPI_Op, MPI_Datatype) tuples
for pre-defined MPI_Ops, allowing components and modules to provide
the back-end functions. The intent is that components can be written
to take advantage of hardware acceleration (GPU, FPGA, specialized CPU
instructions, etc.). Similar to other frameworks, components are
intended to be able to discover at run-time if they can be used, and
if so, elect themselves to be selected (or disqualify themselves from
selection if they cannot run). If specialized hardware is not
available, there is a default set of functions that will automatically
be used.
This framework is ''not'' used for user-defined MPI_Ops.
The new op framework is similar to the existing coll framework, in
that the final set of function pointers that are used on any given
intrinsic MPI_Op can be a mixed bag of function pointers, potentially
coming from multiple different op modules. This allows for hardware
that only supports some of the operations, not all of them (e.g., a
GPU that only supports single-precision operations).
All the hard-coded back-end MPI_Op functions for (MPI_Op,
MPI_Datatype) tuples still exist, but unlike coll, they're in the
framework base (vs. being in a separate "basic" component) and are
automatically used if no component is found at runtime that provides a
module with the necessary function pointers.
There is an "example" op component that will hopefully be useful to
those writing meaningful op components. It is currently
.ompi_ignore'd so that it doesn't impinge on other developers (it's
somewhat chatty in terms of opal_output() so that you can tell when
its functions have been invoked). See the README file in the example
op component directory. Developers of new op components are
encouraged to look at the following wiki pages:
https://svn.open-mpi.org/trac/ompi/wiki/devel/Autogenhttps://svn.open-mpi.org/trac/ompi/wiki/devel/CreateComponenthttps://svn.open-mpi.org/trac/ompi/wiki/devel/CreateFramework
= MPI_REDUCE_LOCAL =
Part of the MPI-2.2 proposal listed here:
https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/24
is to add a new function named MPI_REDUCE_LOCAL. It is very easy to
implement, so I added it (also because it makes testing the op
framework pretty easy -- you can do it in serial rather than via
parallel reductions). There's even a man page!
This commit was SVN r20280.
* If the accumulate is local, make it short-circuit the request path. Accumulate requires local
ops due to its window rules, so this is likely to help a bunch (on the codes I"m messing
with at least)
* Due a better job at flushing everything that can go out on the wire in a resource constrained problem
* Move some debugging values around to make large problems somewhat easier to deal with
This commit was SVN r20277.
to grow to. Without this change, jobs with np>120 get errors.
This does not change anything for np<16 jobs. It only comes into
play with larger np count on a node. I imagine that this can be
scaled back in the future if the usage of memory in the sm
btl is improved.
This fixes trac:1449.
This commit was SVN r20230.
The following Trac tickets were found above:
Ticket 1449 --> https://svn.open-mpi.org/trac/ompi/ticket/1449
* Don't overwrite the des_flags field, removing the
all important always callback field
* Fix up return status of bml_base_send, since
the rest of the code expects OMPI_SUCCESS or
an error code
This commit was SVN r20178.
components to use. This code was rendered obsolete (albiet harmless)
by the MCA base improvements that only open the components that were
specified by each framework's MCA parameter.
This commit was SVN r20176.
The problem was that we doubly decremented the active count on blocking receives that we stall to complete. This moved the active count into the negative. With a negative count for 'active' a message that should have been accounted for would be over looked. This then causes the bookmark exchange to post a drain for a message that was never posted, thus locking the protocol. By eliminating the decrement on the 'active' count when we attempt to post the drain message, we only the decrement this counter when the outstanding blocking recv completes during the stall operation.
Refs trac:1619
Does not close this ticket since there is an outstanding potential problem with ANY_SOURCE and ANY_TAG, as referenced in the ticket.
This should be moved to v1.3
This commit was SVN r20147.
The following Trac tickets were found above:
Ticket 1619 --> https://svn.open-mpi.org/trac/ompi/ticket/1619
problem seems to come from the free list, but due to lack of time to
understand it completely, I provide this fix. Basically, there is no
waiting in the MX BTL anymore, if we cannot allocate a fragment we
rely on the PML to take the corrective actions.
This commit was SVN r20124.
1. minor modification to include two new opal MCA params:
(a) opal_profile: outputs what components were selected by each framework
currently enabled for most, but not all, frameworks
(b) opal_profile_file: name of file that contains profile info required
for modex
2. introduction of two new tools:
(a) ompi-probe: MPI process that simply calls MPI_Init/Finalize with
opal_profile set. Also reports back the rml IP address for all
interfaces on the node
(b) ompi-profiler: uses ompi-probe to create the profile_file, also
reports out a summary of what framework components are actually
being used to help with configuration options
3. modification of the grpcomm basic component to utilize the
profile file in place of the modex where possible
4. modification of orterun so it properly sees opal mca params and
handles opal_profile correctly to ensure we don't get its profile
5. similar mod to orted as for orterun
6. addition of new test that calls orte_init followed by calls to
grpcomm.barrier
This is all completely benign unless actively selected. At the moment, it only supports modex-less launch for openib-based systems. Minor mod to the TCP btl would be required to enable it as well, if people are interested. Similarly, anyone interested in enabling other BTL's for modex-less operation should let me know and I'll give you the magic details.
This seems to significantly improve scalability provided the file can be locally located on the nodes. I'm looking at an alternative means of disseminating the info (perhaps in launch message) as an option for removing that constraint.
This commit was SVN r20098.
of more than one of the btl_openib_if_include, btl_openib_if_exclude,
btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude MCA parameters.
This commit was SVN r20053.
determining which IP address to use when transmitting data. Also it adds logic
to prevent usage of more than one of the btl_openib_if_include,
btl_openib_if_exclude, btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude
MCA parameters.
This should complete the code modifications needed for ticket 1665.
This commit was SVN r20052.
1. fix a bug in pml_ob1_recvreq/sendreq.c, buffer was made defined where the request has already been released.
2. complete memchecker support for collective functions.
3. change the wrongly spelled function name of memchecker, i.e. '*_isaddressible' should be '*_isaddressable'
This commit was SVN r20043.