Due to deallocation ordering (and an entirely missed deallocation), we
were leaking modest amounts of memory inside libusnic_verbs.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29485.
- some free lists simply were not being OBJ_DESTRUCTed, so they never
freed their internal memory
- channel->recv_segs.ctx was being assigned in a way that got clobbered
by ompi_free_list_init_new, so the cleanup code that relied on it
being set never ran
- numerous other ".ctx" assignments were similarly ineffectual and were
not being consumed, so I deleted them
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29484.
This new routine can be called in exceptional situations, either
conditionally in BTL code or from a debugger, to help with debugging in
cases where MSGDEBUG1/2 or stats logging are impractical but more detail
is needed.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29483.
Pull the bulk of the functionality out into a new routine,
ompi_btl_usnic_print_stats, which can be used in other debugging
contexts. This also lets us eliminate the module->final_stats state
tracking.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29482.
Fixes:
- Segmentation fault when using watermark variables.
- Segmentation fault when using a handle bound to a no longer valid
performance variable.
- Incorrect return codes from MPI_T_pvar_* functions.
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r29481.
Thanks to Charles Gerlach for identifying the issue.
Oddly, this issue exists in trunk and v1.7, but ''not'' in the v1.6
tree (!).
cmr=v1.7.4:reviewer=hjelmn
This commit was SVN r29463.
Follow the convention established by the ompi/mca/common/sm tree and
prefix both the "install" and "no install" versions of the build with
"lib" so that Automake doesn't complain. Differentiate the two by
adding a "_noinst" suffix to the "no install" version.
This commit was SVN r29462.
- removed potential double-'/' in CUPTIDIR which makes trouble with rpmbuild's debugedit program (fixes trac:3854)
This commit was SVN r29461.
The following Trac tickets were found above:
Ticket 3854 --> https://svn.open-mpi.org/trac/ompi/ticket/3854
This change contains a non-mandatory modification
of the MPI-RTE interface. Anyone wishing to support
coprocessors such as the Xeon Phi may wish to add
the required definition and underlying support
****************************************************************
Add locality support for coprocessors such as the Intel Xeon Phi.
Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host.
So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following:
1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board
2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions
3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future.
4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time.
5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored.
6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set.
cmr:v1.7.4:reviewer=hjelmn
This commit was SVN r29435.
Reworked ompi_info tool to be close with orte_info implementation.
ompi_info_register_types(), ompi_info_close_components() and
ompi_info_show_ompi_version() are moved to runtime/ompi_info_support.c.
Added runtime/oshmem_info_support layer that exports following api to be
used into oshmem_info tool as
oshmem_info_register_types()
oshmem_info_register_framework_params()
oshmem_info_close_components()
oshmem_info_show_oshmem_version()
These functions call ompi_info_support related interfaces as long as
Oshmem supports Open MPI/SHMEM combination.
Now orte_info/ompi_info/oshmem_info have identical implementation approach.
Possible improvement:
OSHMEM processing of --config option is the same as OMPI`s (code is duplicated).
Probably list of info_support interfaces can be extended by xxx_info_do_config().
developed by Igor, reviewed by miked
This commit was SVN r29429.
* Use the right length for memset/strncpy
* Return the set return value (vs. unconditionally returning
OMPI_SUCCESS)
cmr=v1.7.4:reviewer=dgoodell:subject=Fix a pair of minor errors in the affinity MPI extension
This commit was SVN r29427.
The following common shared libraries did not have versioning:
* ompi/common/ofacm
* ompi/common/verbs
* ompi/common/ugni
Additionally, we still had shared library versions in VERSION for the
following libraries, which no longer exist:
* ompi/common/portals
* opal/common/hwloc
This commit was SVN r29421.
Also, removed an MPI_Aint snuck through (which is a C type, not a Fortran
type). Oddly, the Intel compiler complained about neither of these
issues. :-\
This commit was SVN r29411.
Fix two problems that surfaced when using direct launch under SLURM:
1. locally store our own data because some BTLs want to retrieve
it during add_procs rather than use what they have internally
2. cleanup MPI_Abort so it correctly passes the error status all
the way down to the actual exit. When someone implemented the
"abort_peers" API, they left out the error status. So we lost
it at that point and *always* exited with a status of 1. This
forces a change to the API to include the status.
cmr:v1.7.3:reviewer=jsquyres:subject=Fix MPI_Abort and modex_recv for direct launch
This commit was SVN r29405.
Since this header is included in .F90 files (which are preprocessed,
vs. .f90 files, which are *not* preprocessed), we don't want to
accidentally start C-style comments, which are recognized by some
Fortran compiler preprocessors (e.g., Absoft).
This commit was SVN r29394.
So just move the comment to the prior line, and it's all good. This
is obviosuly not *necessary*, but it helps cut down on warning noise.
This commit was SVN r29393.
BIND(C) doesn't let us have LOGICAL parameters, so we have to be
creative in how we invoke back-end ompi_*_f() C functions.
Additionally, the mpi_f08 type for MPI_Status presented some
difficulties, too.
See the large comment in
ompi/mpi/fortran/use-mpi-f08/mpi-f-interfaces-bind.h that explains
this in much more detail.
This commit was SVN r29384.
There is no "MPI_Aint" in the Fortran interface. Surprisingly, the
Intel compiler didn't choke on this, but the Absoft compiler did.
This commit was SVN r29371.
So we now allow singletons to start on their own, only spawning an HNP when initiating an operation that actually requires it.
cmr:v1.7.4:reviewer=jsquyres
This commit was SVN r29354.