1
1
Граф коммитов

6703 Коммитов

Автор SHA1 Сообщение Дата
Dave Goodell
e9dbb66e58 mpool/rdma: fix memory leak at module finalize
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29487.
2013-10-23 15:51:55 +00:00
Dave Goodell
647e5a6fd2 rcache/vma: fix module finalization memory leaks
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29486.
2013-10-23 15:51:44 +00:00
Dave Goodell
d969cfa513 usnic: correctly clean up verbs resources
Due to deallocation ordering (and an entirely missed deallocation), we
were leaking modest amounts of memory inside libusnic_verbs.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29485.
2013-10-23 15:51:33 +00:00
Dave Goodell
a6ed232a10 usnic: fix several memory leaks
- some free lists simply were not being OBJ_DESTRUCTed, so they never
  freed their internal memory

- channel->recv_segs.ctx was being assigned in a way that got clobbered
  by ompi_free_list_init_new, so the cleanup code that relied on it
  being set never ran

- numerous other ".ctx" assignments were similarly ineffectual and were
  not being consumed, so I deleted them

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29484.
2013-10-23 15:51:22 +00:00
Dave Goodell
c9b2343982 usnic: add ompi_btl_usnic_component_debug helper
This new routine can be called in exceptional situations, either
conditionally in BTL code or from a debugger, to help with debugging in
cases where MSGDEBUG1/2 or stats logging are impractical but more detail
is needed.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29483.
2013-10-23 15:51:11 +00:00
Dave Goodell
d0b7d125b2 usnic: refactor usnic_stats_callback
Pull the bulk of the functionality out into a new routine,
ompi_btl_usnic_print_stats, which can be used in other debugging
contexts.  This also lets us eliminate the module->final_stats state
tracking.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29482.
2013-10-23 15:50:57 +00:00
Nathan Hjelm
d34a4300b8 Fix various bugs in mca_base_pvar.
Fixes:

 - Segmentation fault when using watermark variables.

 - Segmentation fault when using a handle bound to a no longer valid
   performance variable.

 - Incorrect return codes from MPI_T_pvar_* functions.

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29481.
2013-10-23 15:47:15 +00:00
Jeff Squyres
0fb8edd720 Trivial comment change
This commit was SVN r29480.
2013-10-23 10:15:18 +00:00
Mike Dubman
d6ead2a3a5 Add support for routable ROCE where different subnet_id is a valid to proceed with MPI routing.
(can happen in the same LAN)
developed by vasily, reviewed by miked
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29479.
2013-10-23 06:08:54 +00:00
Ralph Castain
16b1ad052f Silence compiler warning
This commit was SVN r29478.
2013-10-23 04:13:51 +00:00
Nathan Hjelm
5bf6555604 Fix locality when in the case where the OMPI_RTE_HOST_ID is not found.
cmr=v1.7.4:ticket=3847

This commit was SVN r29475.

The following Trac tickets were found above:
  Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847
2013-10-22 19:07:03 +00:00
Rolf vandeVaart
3c916d55c9 Fix two issues pointed out in review of ticket #3870.
This commit was SVN r29473.
2013-10-22 17:28:12 +00:00
Mike Dubman
d27cffedb9 expand tabs to 4 spaces
cd ompi/mca/coll/fca
for i in *.[ch]; do expand -t 4 $i > koko && mv koko $i; done
Refs: #3799

This commit was SVN r29472.
2013-10-22 17:05:55 +00:00
Jeff Squyres
6714890244 paffinity.h is gone and won't be coming back.
This commit was SVN r29467.
2013-10-22 15:59:00 +00:00
Nathan Hjelm
7bee047e5d Fix rentry check in communicator request progress.
cmr=v1.7.4:ticket=3796

This commit was SVN r29465.

The following Trac tickets were found above:
  Ticket 3796 --> https://svn.open-mpi.org/trac/ompi/ticket/3796
2013-10-22 15:33:39 +00:00
Nathan Hjelm
280a89448f Make btl/vader valgrind safe.
cmr=v1.7.4:reviewer=samuel

This commit was SVN r29464.
2013-10-22 15:33:32 +00:00
Jeff Squyres
506d0e96f4 Fix the IN_PLACE detection for Fortran SCATTER and SCATTERV.
Thanks to Charles Gerlach for identifying the issue.

Oddly, this issue exists in trunk and v1.7, but ''not'' in the v1.6
tree (!).

cmr=v1.7.4:reviewer=hjelmn

This commit was SVN r29463.
2013-10-22 14:55:17 +00:00
Jeff Squyres
09fae6e62b Prefix DSO filenames with "lib" so that Automake doesn't complain.
Follow the convention established by the ompi/mca/common/sm tree and
prefix both the "install" and "no install" versions of the build with
"lib" so that Automake doesn't complain.  Differentiate the two by
adding a "_noinst" suffix to the "no install" version.

This commit was SVN r29462.
2013-10-22 13:16:33 +00:00
Matthias Jurenz
582d5337ce Changes to VT's configure:
- removed potential double-'/' in CUPTIDIR which makes trouble with rpmbuild's debugedit program (fixes trac:3854)

This commit was SVN r29461.

The following Trac tickets were found above:
  Ticket 3854 --> https://svn.open-mpi.org/trac/ompi/ticket/3854
2013-10-22 09:02:11 +00:00
Rolf vandeVaart
0cd1e8dfd9 Add runtime support to turn off CUDA IPC support.
This commit was SVN r29444.
2013-10-16 16:48:18 +00:00
Rolf vandeVaart
9f83405c78 Fix one more corner case initialization issue.
This commit was SVN r29443.
2013-10-16 16:39:19 +00:00
Ralph Castain
bd0b13221b Cleanup ompi_info change to silence compiler warning
This commit was SVN r29436.
2013-10-14 16:57:50 +00:00
Ralph Castain
24c811805f ****************************************************************
This change contains a non-mandatory modification
       of the MPI-RTE interface. Anyone wishing to support
       coprocessors such as the Xeon Phi may wish to add
       the required definition and underlying support
****************************************************************

Add locality support for coprocessors such as the Intel Xeon Phi.

Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host.

So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following:

1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board

2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions

3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future.

4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time.

5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored.

6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set.

cmr:v1.7.4:reviewer=hjelmn

This commit was SVN r29435.
2013-10-14 16:52:58 +00:00
Mike Dubman
2141e9e6b4 tools: Add oshmem_info utility
Reworked ompi_info tool to be close with orte_info implementation.
ompi_info_register_types(), ompi_info_close_components() and
ompi_info_show_ompi_version() are moved to runtime/ompi_info_support.c.

Added runtime/oshmem_info_support layer that exports following api to be
used into oshmem_info tool as
oshmem_info_register_types()
oshmem_info_register_framework_params()
oshmem_info_close_components()
oshmem_info_show_oshmem_version()
These functions call ompi_info_support related interfaces as long as
Oshmem supports Open MPI/SHMEM combination.

Now orte_info/ompi_info/oshmem_info have identical implementation approach.

Possible improvement:
OSHMEM processing of --config option is the same as OMPI`s (code is duplicated).
Probably list of info_support interfaces can be extended by xxx_info_do_config().
developed by Igor, reviewed by miked

This commit was SVN r29429.
2013-10-12 19:03:32 +00:00
Mike Dubman
5a7dff2d15 fix icc warning
fixed by Dinar, reviewed by miked
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29428.
2013-10-12 18:04:28 +00:00
Jeff Squyres
eabbcd1157 Fix a pair of minor errors in the affinity MPI extension.
* Use the right length for memset/strncpy
* Return the set return value (vs. unconditionally returning
  OMPI_SUCCESS)

cmr=v1.7.4:reviewer=dgoodell:subject=Fix a pair of minor errors in the affinity MPI extension

This commit was SVN r29427.
2013-10-11 21:17:38 +00:00
Jeff Squyres
b5e2ae86ad Remove all of our "to-do" items from the README.txt.
This commit was SVN r29424.
2013-10-11 16:43:56 +00:00
Rolf vandeVaart
fbf143f3b4 Move another function that was missed in r29347.
This commit was SVN r29422.

The following SVN revision numbers were found above:
  r29347 --> open-mpi/ompi@ce61985503
2013-10-10 14:48:56 +00:00
Jeff Squyres
d9be19f011 Added shared library versions to those who were missing it.
The following common shared libraries did not have versioning:

 * ompi/common/ofacm
 * ompi/common/verbs
 * ompi/common/ugni

Additionally, we still had shared library versions in VERSION for the
following libraries, which no longer exist:

 * ompi/common/portals
 * opal/common/hwloc

This commit was SVN r29421.
2013-10-10 13:25:57 +00:00
Nathan Hjelm
e11233cb65 Remove unnecessary member from the comm idup context.
Refs trac:3796

This commit was SVN r29419.

The following Trac tickets were found above:
  Ticket 3796 --> https://svn.open-mpi.org/trac/ompi/ticket/3796
2013-10-09 22:00:17 +00:00
Jeff Squyres
2d7d7ab731 Absoft caught that bind(c) functions can't have OPTIONAL dummy arguments.
Also, removed an MPI_Aint snuck through (which is a C type, not a Fortran
type).  Oddly, the Intel compiler complained about neither of these
issues.  :-\

This commit was SVN r29411.
2013-10-09 14:29:34 +00:00
Ralph Castain
9902748108 ***** THIS INCLUDES A SMALL CHANGE IN THE MPI-RTE INTERFACE *****
Fix two problems that surfaced when using direct launch under SLURM:

1. locally store our own data because some BTLs want to retrieve 
   it during add_procs rather than use what they have internally

2. cleanup MPI_Abort so it correctly passes the error status all
   the way down to the actual exit. When someone implemented the
   "abort_peers" API, they left out the error status. So we lost
   it at that point and *always* exited with a status of 1. This 
   forces a change to the API to include the status.

cmr:v1.7.3:reviewer=jsquyres:subject=Fix MPI_Abort and modex_recv for direct launch

This commit was SVN r29405.
2013-10-08 18:37:59 +00:00
Jeff Squyres
0d093176e1 Add missing .so version numbers for libmpi_usempif08.so.
cmr=v1.7.3:reviewer=rhc:subject=Add missing libmpi_usempif08 shared lib

This commit was SVN r29401.
2013-10-08 16:20:12 +00:00
Jeff Squyres
66dadbe1e7 Per RFC, remove the udapl BTL.
This commit was SVN r29400.
2013-10-08 15:18:59 +00:00
Jeff Squyres
a3606c4029 Add weak symbol #defines.
Fixes compile errors if you compile --without-weak-symbols.

This commit was SVN r29396.
2013-10-08 14:54:40 +00:00
Jeff Squyres
ad4abdbf53 Avoid using /* in the comment (which starts a C comment)
Since this header is included in .F90 files (which are preprocessed,
vs. .f90 files, which are *not* preprocessed), we don't want to
accidentally start C-style comments, which are recognized by some
Fortran compiler preprocessors (e.g., Absoft).

This commit was SVN r29394.
2013-10-08 14:22:55 +00:00
Jeff Squyres
ceecc04bfe Some Fortran compilers warn about ! comments after #endif.
So just move the comment to the prior line, and it's all good.  This
is obviosuly not *necessary*, but it helps cut down on warning noise.

This commit was SVN r29393.
2013-10-08 13:47:44 +00:00
Jeff Squyres
9f9fb5ce38 Add weak symbols for the MPI-3.1 MPI_Alloc_mem_cptr fortran subroutine.
This subroutine was added in an errata to MPI-3.0; here's the MPI
Forum ticket about it:
    
    https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/390

This commit was SVN r29385.
2013-10-04 22:43:40 +00:00
Jeff Squyres
c08f97b030 Fix problems with calling back-end BIND(C) ompi_*_f() functions.
BIND(C) doesn't let us have LOGICAL parameters, so we have to be
creative in how we invoke back-end ompi_*_f() C functions.
Additionally, the mpi_f08 type for MPI_Status presented some
difficulties, too.
    
See the large comment in
ompi/mpi/fortran/use-mpi-f08/mpi-f-interfaces-bind.h that explains
this in much more detail.

This commit was SVN r29384.
2013-10-04 22:43:07 +00:00
Jeff Squyres
f23d3bca64 Remove BIND(C) from all mpi_f08 interfaces.
This commit was SVN r29383.
2013-10-04 22:41:59 +00:00
Jeff Squyres
e81e3ccee0 Add missing implementation of MPI_FREE_MEM in the TKR mpi module.
This commit was SVN r29382.
2013-10-04 22:38:57 +00:00
Jeff Squyres
886e2cbf0f Remove and eliminate this extra redundant phrase.
This commit was SVN r29381.
2013-10-04 22:12:04 +00:00
Jeff Squyres
4899c531d3 Add PMPI versions for all ignore-TKR Fortran mpi module interfaces.
This commit was SVN r29380.
2013-10-04 21:58:56 +00:00
Rolf vandeVaart
3bd02fbaf5 Add one more verbose debug output that prints when we are out of memory.
This commit was SVN r29378.
2013-10-04 18:56:06 +00:00
Nathan Hjelm
b025babfec Correctly set the state for communicator requests. Refs trac:3796
This commit was SVN r29377.

The following Trac tickets were found above:
  Ticket 3796 --> https://svn.open-mpi.org/trac/ompi/ticket/3796
2013-10-04 17:34:54 +00:00
Nathan Hjelm
ba4b0f9235 Remove meaningless checks. Refs trac:3828
This commit was SVN r29375.

The following Trac tickets were found above:
  Ticket 3828 --> https://svn.open-mpi.org/trac/ompi/ticket/3828
2013-10-04 17:24:59 +00:00
Nathan Hjelm
70878764d7 Silence compiler warnings on 32-bit platforms. Refs trac:3828
This commit was SVN r29374.

The following Trac tickets were found above:
  Ticket 3828 --> https://svn.open-mpi.org/trac/ompi/ticket/3828
2013-10-04 15:53:14 +00:00
Jeff Squyres
be191619a9 Remove erroneous use of "MPI_Aint".
There is no "MPI_Aint" in the Fortran interface.  Surprisingly, the
Intel compiler didn't choke on this, but the Absoft compiler did.

This commit was SVN r29371.
2013-10-04 15:27:29 +00:00
Rolf vandeVaart
66725f6973 Enable some CUDA-aware support on tcp btl. Only when configured in.
This commit was SVN r29364.
2013-10-04 12:50:16 +00:00
Ralph Castain
f4f2287958 Singletons currently start out by spawning an HNP - this is required solely in the cases where the singleton subsequently calls MPI_Comm_spawn or publishes port info without support from an external orte-server. In all other cases, the HNP is of no value and can actually be a detriment by creating additional overhead on the node. This is particularly concerning for async operations where processes may begin as singletons and then dynamically wireup to perform pt2pt communications.
So we now allow singletons to start on their own, only spawning an HNP when initiating an operation that actually requires it.

cmr:v1.7.4:reviewer=jsquyres

This commit was SVN r29354.
2013-10-04 02:58:26 +00:00