Due to deallocation ordering (and an entirely missed deallocation), we
were leaking modest amounts of memory inside libusnic_verbs.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29485.
- some free lists simply were not being OBJ_DESTRUCTed, so they never
freed their internal memory
- channel->recv_segs.ctx was being assigned in a way that got clobbered
by ompi_free_list_init_new, so the cleanup code that relied on it
being set never ran
- numerous other ".ctx" assignments were similarly ineffectual and were
not being consumed, so I deleted them
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29484.
This new routine can be called in exceptional situations, either
conditionally in BTL code or from a debugger, to help with debugging in
cases where MSGDEBUG1/2 or stats logging are impractical but more detail
is needed.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29483.
Pull the bulk of the functionality out into a new routine,
ompi_btl_usnic_print_stats, which can be used in other debugging
contexts. This also lets us eliminate the module->final_stats state
tracking.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29482.
Fixes:
- Segmentation fault when using watermark variables.
- Segmentation fault when using a handle bound to a no longer valid
performance variable.
- Incorrect return codes from MPI_T_pvar_* functions.
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r29481.
Thanks to Charles Gerlach for identifying the issue.
Oddly, this issue exists in trunk and v1.7, but ''not'' in the v1.6
tree (!).
cmr=v1.7.4:reviewer=hjelmn
This commit was SVN r29463.
Follow the convention established by the ompi/mca/common/sm tree and
prefix both the "install" and "no install" versions of the build with
"lib" so that Automake doesn't complain. Differentiate the two by
adding a "_noinst" suffix to the "no install" version.
This commit was SVN r29462.
- removed potential double-'/' in CUPTIDIR which makes trouble with rpmbuild's debugedit program (fixes trac:3854)
This commit was SVN r29461.
The following Trac tickets were found above:
Ticket 3854 --> https://svn.open-mpi.org/trac/ompi/ticket/3854
This change contains a non-mandatory modification
of the MPI-RTE interface. Anyone wishing to support
coprocessors such as the Xeon Phi may wish to add
the required definition and underlying support
****************************************************************
Add locality support for coprocessors such as the Intel Xeon Phi.
Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host.
So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following:
1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board
2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions
3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future.
4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time.
5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored.
6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set.
cmr:v1.7.4:reviewer=hjelmn
This commit was SVN r29435.
Reworked ompi_info tool to be close with orte_info implementation.
ompi_info_register_types(), ompi_info_close_components() and
ompi_info_show_ompi_version() are moved to runtime/ompi_info_support.c.
Added runtime/oshmem_info_support layer that exports following api to be
used into oshmem_info tool as
oshmem_info_register_types()
oshmem_info_register_framework_params()
oshmem_info_close_components()
oshmem_info_show_oshmem_version()
These functions call ompi_info_support related interfaces as long as
Oshmem supports Open MPI/SHMEM combination.
Now orte_info/ompi_info/oshmem_info have identical implementation approach.
Possible improvement:
OSHMEM processing of --config option is the same as OMPI`s (code is duplicated).
Probably list of info_support interfaces can be extended by xxx_info_do_config().
developed by Igor, reviewed by miked
This commit was SVN r29429.
* Use the right length for memset/strncpy
* Return the set return value (vs. unconditionally returning
OMPI_SUCCESS)
cmr=v1.7.4:reviewer=dgoodell:subject=Fix a pair of minor errors in the affinity MPI extension
This commit was SVN r29427.
The following common shared libraries did not have versioning:
* ompi/common/ofacm
* ompi/common/verbs
* ompi/common/ugni
Additionally, we still had shared library versions in VERSION for the
following libraries, which no longer exist:
* ompi/common/portals
* opal/common/hwloc
This commit was SVN r29421.
Also, removed an MPI_Aint snuck through (which is a C type, not a Fortran
type). Oddly, the Intel compiler complained about neither of these
issues. :-\
This commit was SVN r29411.
Fix two problems that surfaced when using direct launch under SLURM:
1. locally store our own data because some BTLs want to retrieve
it during add_procs rather than use what they have internally
2. cleanup MPI_Abort so it correctly passes the error status all
the way down to the actual exit. When someone implemented the
"abort_peers" API, they left out the error status. So we lost
it at that point and *always* exited with a status of 1. This
forces a change to the API to include the status.
cmr:v1.7.3:reviewer=jsquyres:subject=Fix MPI_Abort and modex_recv for direct launch
This commit was SVN r29405.
Since this header is included in .F90 files (which are preprocessed,
vs. .f90 files, which are *not* preprocessed), we don't want to
accidentally start C-style comments, which are recognized by some
Fortran compiler preprocessors (e.g., Absoft).
This commit was SVN r29394.
So just move the comment to the prior line, and it's all good. This
is obviosuly not *necessary*, but it helps cut down on warning noise.
This commit was SVN r29393.
BIND(C) doesn't let us have LOGICAL parameters, so we have to be
creative in how we invoke back-end ompi_*_f() C functions.
Additionally, the mpi_f08 type for MPI_Status presented some
difficulties, too.
See the large comment in
ompi/mpi/fortran/use-mpi-f08/mpi-f-interfaces-bind.h that explains
this in much more detail.
This commit was SVN r29384.
There is no "MPI_Aint" in the Fortran interface. Surprisingly, the
Intel compiler didn't choke on this, but the Absoft compiler did.
This commit was SVN r29371.
So we now allow singletons to start on their own, only spawning an HNP when initiating an operation that actually requires it.
cmr:v1.7.4:reviewer=jsquyres
This commit was SVN r29354.
the fortran handle. Use a seperate opal_pointer_array to keep track of
the fortran handles of communicators.
This commit also fixes a bug in ompi_comm_idup where the newcomm was not
set until after the operation completed.
cmr=v1.7.4:reviewer=jsquyres:ticket=trac:3796
This commit was SVN r29342.
The following Trac tickets were found above:
Ticket 3796 --> https://svn.open-mpi.org/trac/ompi/ticket/3796
Thanks to Jeff for finding and fixing compilation issues with
the new fortran bindings.
cmr=v1.7.4:reviewer=jsquyres:ticket=trac:3796
This commit was SVN r29335.
The following Trac tickets were found above:
Ticket 3796 --> https://svn.open-mpi.org/trac/ompi/ticket/3796
MPI_Comm_idup.
As part of this work I implemented a basic request scheduler in
ompi/comm/comm_request.c. This scheduler might be useful for more
than just communicator requests and could be moved to ompi/request
if there is a demand. Otherwise I will leave it where it is.
Added a non-blocking version of ompi_comm_set to support ompi_comm_idup.
The call makes a recursive call to comm_dup and a non-blocking version
was needed. To simplify the code the blocking version calls the nonblocking
version and waits on the resulting request if one exists.
cmr=v1.7.4:reviewer=jsquyres:ticket=trac:3796
This commit was SVN r29334.
The following Trac tickets were found above:
Ticket 3796 --> https://svn.open-mpi.org/trac/ompi/ticket/3796
The algorithms are intended for MPI-3.0 compliance and are not
optimized. We should aim to add better algorithms in the future through
cheetah.
MPI_Iallreduce and MPI_Igatherv on intercommunicators are required for
MPI_Comm_idup support.
cmr=v1.7.4:reviewer=brbarret:ticket=trac:2715
This commit was SVN r29333.
The following Trac tickets were found above:
Ticket 2715 --> https://svn.open-mpi.org/trac/ompi/ticket/2715
1. Change in rte api implementation: now comm_world used to do p2p.
This allows to not worry about other comms being destroyed.
2. added a notification mechanism with a help of which runtime can say libhcoll that RTE api can not be used any longer.
pass a pointer to a flag, and its size to libhcoll.
The flag changes when the RTE is no longer available.
Currently this flag is just ompi_mpi_finalized global bool value.
cmr=v1.7.3:reviewer=jladd
This commit was SVN r29331.
- adapted Open MPI version check (the "const" stuff will come with version 1.7.4, not from 1.9.0)
- removed the const keyword from the deprecated functions MPI_Type_hindexed and MPI_Type_struct
TODO: Since MPICH v3.x adds the const keyword to that functions we need a configure test to figure out whether adding const is required. (only relevant for the stand-alone version of VampirTrace)
This commit was SVN r29317.
The following SVN revision numbers were found above:
r29314 --> open-mpi/ompi@29b22f350e
Use the same trick for send/recvtypes as we did in r29298 for
![I]Alltoallw. Also, add all prototypes for all the back-end
ompi_neighbor_*_f functions.
Refs trac:3802
This commit was SVN r29299.
The following SVN revision numbers were found above:
r29298 --> open-mpi/ompi@0acebc0a1f
The following Trac tickets were found above:
Ticket 3802 --> https://svn.open-mpi.org/trac/ompi/ticket/3802
Use proper array types for the sendtypes and recevtypes params in the
definition of these subroutines (since they're prototyped to be array
parameters in the module). See the comment in the code for a more
complete explanation.
Note that after talking to Craig Rasumussen, he says that this is
fairly common practice in the Fortran community (i.e., pass a scalar
that is part of an array to effectively pass an offset into that
array, since Fortran passes by reference). So this might be a hack,
but it's a commonly-accepted hack.
Reviewed by Craig Rasmussen.
cmr=v1.7.3:subject=Fix [I]Alltoallw mpi_f08 wrapper functions
This commit was SVN r29298.
* Always enable "const" in the wrapper functions, even though Open MPI
doesn't advertise itself as MPI-3.0 yet
* Remove CONST from MPI_Type_hindexed, MPI_Type_struct (because
they're deprecated functions, and never had const added to them)
This commit was SVN r29280.
collective to the mca_coll_base_comm_coll_t structure increased the size
of the ompi_communicator_t over the limit of the predefined padding
(PREDEFINED_COMMUNICATOR_PAD).
This fix is a temporary fix to allow the trunk to compile. Unfortuantely
it breaks the compatibility with all other versions of Open MPI. Please
read the comment in this header file for a more complete explanation.
This commit was SVN r29277.
The following SVN revision numbers were found above:
r29265 --> open-mpi/ompi@c5596548b2
Create a new required key in the OMPI layer for retrieving a "node id" from the database. ALL RTE'S MUST DEFINE THIS KEY. This allows us to compute locality in the MPI layer, which is necessary when we do things like intercomm_create.
cmr:v1.7.4:reviewer=rhc:subject=Cleanup handling of modex data
This commit was SVN r29274.
arrays.
The MPI 3.0 standard added const to all in buffers in the C bindings. This
commit adds the const keyword and in most cases casts const away. We will
eventually should go through and update the various interfaces (coll, pml,
io, etc) to take the const keyword. The group, comm, win, and datatype
interfaces have been updated with const.
cmr=v1.7.4:ticket=trac:3785:reviewer=jsquyres
This commit was SVN r29266.
The following Trac tickets were found above:
Ticket 3785 --> https://svn.open-mpi.org/trac/ompi/ticket/3785
Blocking versions are simple linear algorithms implemented in coll/basic. Non-
blocking versions are from libnbc 1.1.1. All algorithms have been tested with
simple test cases.
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r29265.
Prevent frag from being freed out from under us in the case
the PML callback routine calls usnic_free(). We accomplish this
by delaying decrement of sf_bytes_to_ack until after the callback is
performed, since sf_bytes_to_ack == 0 is condition of freeing the frag.
Fixes Cisco bug CSCuj45094.
Authored-by: Reese Faucette <rfaucett@cisco.com>
cmr=v1.7.3
This commit was SVN r29264.
Includes all MPI functions supported by Open MPI, including MPI-3
functions (as of about 2 weeks ago). Many changes compared to the
prior generation of Java bindings; not much is left from the prior
generation, actually. The changes include (but are not limited to):
* Add support for more than just a subset of MPI-1 functions
* Use typical Java case for symbol names
* Support Java Direct buffers (giving darn-near "native C"
performance)
* Support "type struct" better than the prior generation
* Make more of an effort for the Java bindings to be a thin layer
over the back-end C bindings
* ...and more
A proper README with more information about what is supported, how to
use these bindings, etc. will be committed shortly.
This commit was SVN r29263.
This commit changes the underlying opal complex datatypes to match the
C99 types: float _Complex, double _Complex, and long double _Complex. The
fortran and C++ types now are aliases to these basic types instead of
structure types. The operators in ompi/mca/op/base now work on only the
C99 types and the fortran types use these operators if the fortran type
matches a C complex type (this should almost always be the case.)
C99 is not is use in both the datatype and operator code and should make
the code both cleaner and much less fragile.
This commit was SVN r29193.
of MPI_Alltoall.
- add support for MPI_IN_PLACE in the self collective component.
- fix the extent usage in the tuned collective component.
- correctly use the peer counts instead of local - add support for MPI_IN_PLACE in the self collective component.
- fix the extent usage in the tuned collective component.
- correctly use the peer counts instead of local.
Thanks to Fujitsu for the patch.
This commit was SVN r29187.
MSGDEBUG2 now means "print a one-liner for all PML calls into BTL, and
also when BTL calls PML with a recv completion (not send completions)"
MSGDEBUG1 means print more internal gory detail
MSGDEBUG is gone, replaced by MSGDEBUG1
In the process also found that PUT_DEST style fragments could
potentially be leaked in usnic_free() since send_fragment tests were
being applied to see if it was eligible to be freed.
This commit was SVN r29185.
changes required to support MPI_Bsend(). Introduces concept of
attaching a buffer to a large segment that the PML can scribble into and
we will send from. The reason we don't use a pinned buffer and send
directly from that is that usnic_verbs does not (yes) support num_sge>1
for regular sends. This means the data gets copied twice, but that is
unavoidable.
changed the logic in handle_large_send to be more sensible
Incorporated David's review comments
This commit was SVN r29184.
Do not assume that the "size" passed to alloc_send() will be the same as
the size of the message the resulting fragment will hold when
usnic_send() is called. This means usnic_send()/usnic_put() can never
trust any pre-computed size values, and are only allowed to look at the
lengths and pointers of the elements in the desc SG list.
This commit was SVN r29183.
- tag needs to be sent in *our* header, not the PML header
- usnic_alloc() should return smaller value if too much data requested
- be careful about callbacks vs removing items from lists
(we need to remove from outr lists *before* the callback)
- improve send callback handling
- add some more MSGDEBUG2 logging and cleanup
This commit was SVN r29181.
The intercomm "merge" function can create a linkage between procs that was not reflected anywhere in a modex, and so at least some of the procs in the resulting communicator don't know how to talk to some of the new communicator's peers.
For example, consider the case where:
1. parent job A comm_spawns a process (job B) - these processes exchange modex and can communicate
2. parent job A now comm_spawns another process (job C) - again, these can communicate, but the proc in C knows nothing of B
3. do an intercomm merge across the communicators created by the two comm_spawns. This puts B and C into the same communicator, but they know nothing about how to talk to each other as they were not involved in any exchange of contact info. Hence, collectives on that communicator now fail.
This fix adds an API to the ompi/dpm framework that (a) exchanges the modex info across the procs in the merge to ensure all procs know how to communicate, and (b) calls add_procs to give the btl's a chance to select transports to any new procs.
cmr:v1.7.3:reviewer=jsquyres
This commit was SVN r29166.
The following Trac tickets were found above:
Ticket 2904 --> https://svn.open-mpi.org/trac/ompi/ticket/2904
The FREE_LIST_*_MT stuff was introduced on the SVN trunk in r28722
(2013-07-04), but so far, has not been merged into the v1.7 branch yet
(2013-09-06). So put it in its own #ifdef, rather than defining it
based on OMPI_MAJOR_VERSION/OMPI_MINOR_VERSION.
This commit was SVN r29148.
The following SVN revision numbers were found above:
r28722 --> open-mpi/ompi@c9e5ab9ed1
The Cisco-maintained v1.6 port of the usnic BTL has diverged from the
upstream trunk and v1.7 branches. This commit adjusts the trunk to more
closely match the v1.6 branch to simplify future merging and
cherry-picking.
The usnic MCA parameters also need work on this side.
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)
This commit was SVN r29138.
The following Trac tickets were found above:
Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
The fix for the HPL SEGV was incorrect because it assumed the
prepare_src() routine was always allowed to return "bytes processed"
less than the requested "bytes to send". It turns out this is only true
if the convertor is what limits the size, we are not allowed to limit
the data sent for our own reasons, else we break login in the upper
layers.
This means we need to learn the number of bytes out of the size
requested the convertor will give us, no matter how big the size is.
Unfortunately, this is a destructive test, and (currently) the only way to
learn that number is to actually have the convertor copy the data out into
buffers.
This change implements this, copying the entire data out into a chain of
send segments which are attached to the large send fragment. Now we can
always return the proper size value to the PML.
Fixes Cisco bug CSCuj08024
Authored-by: Reese Faucette <rfaucett@cisco.com>
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)
This commit was SVN r29137.
The following Trac tickets were found above:
Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
Authored-by: Reese Faucette <rfaucett@cisco.com>
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)
This commit was SVN r29136.
The following Trac tickets were found above:
Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)
This commit was SVN r29135.
The following Trac tickets were found above:
Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)
This commit was SVN r29134.
The following Trac tickets were found above:
Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
- round segment buffer allocation to cache-line
- split some routines into an inline fast section and a called
slower section
- introduce receive fastpath in component_progress that:
o returns immediately if there is a packet available on priority
queue and fastpath is enabled
o disables fastpath for 1 time after use to provide fairness to
other processing
o defers receive buffer posting
o defers bookeeping for receive until next call
to usnic_component_progress
Authored-by: Reese Faucette <rfaucett@cisco.com>
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)
This commit was SVN r29133.
The following Trac tickets were found above:
Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760