1
1

6715 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
6569019b06 Move all usNIC stats to _stats.c|h and export them as MPI_T pvars.
This commit moves all the module stats into their own struct so that
the stats only need to appear as a single line in the module_t
definition, and then moves all the logic for reporting the stats into
btl_usnic_stats.c|h.

Further, the stats are now exported as MPI_T_BIND_NO_OBJECT entities
(i.e., not bound to any particular MPI handle), and are marked as
READONLY and CONTINUOUS.  They currently all default to verbose level
5 ("Application tuner / detailed", according to
https://svn.open-mpi.org/trac/ompi/wiki/MCAParamLevels).

Most of the statistics are counters, but a small number are high
watermark values.  Due to how counters are reported via MPI_T, none of
the counters are exported through MPI_T if the MCA param
btl_usnic_stats_relative=1 (i.e., the module resets the stats back to
zero at a given frequency).

When MPI_T_pvar_handle_alloc() is invoked on any of these pvars, it
will return a count that is equal to the number of active usnic BTL
modules.  The values returned for any given pvar (e.g.,
num_total_sends) are an array containing one value for each active
usnic BTL module.  The ordering of values in the array is both
consistent across all usnic pvars and stable throughout a single job:
array slot 0 corresponds to module X, array slot 1 corresponds to
module Y, etc.

Mapping which array slot corresponds to which underlying Linux usnic_X
device works as follows:

 * The btl_usnic_devices MPI_T state pvar is associated with a
   btl_usnic_device MPI_T enum, and be obtained via
   MPI_T_pvar_get_info().
 * If all usNIC pvars are of length N, the values [0,N) in the
   btl_usnic_device enum are associated with strings of the
   corresponding underlying Linux device.

For exampe, to look up which Linux device is reported in all usNIC
pvars' array slot 1, look up the int value 1 in the btl_usnic_devices
enum.  Its corresponding string value is underlying Linux device name
(e.g., "usnic_1").

cmr=v1.7.4:subject="usnic BTL MPI_T pvars"

This commit was SVN r29545.
2013-10-28 22:23:08 +00:00
Nathan Hjelm
404cceb9c4 Always check the return of [mc]alloc and fix a warning introduced by
r29479.

This fixes some issues reported awhile ago in the openib btl. There
are a couple more unchecked mallocs but they are a bit more difficult
to fix since they are in void functions (btl_openib_endpoint.c).

Refs trac:2401.

cmr=v1.7.4:reviewer=miked

This commit was SVN r29543.

The following SVN revision numbers were found above:
  r29479 --> open-mpi/ompi@d6ead2a3a5

The following Trac tickets were found above:
  Ticket 2401 --> https://svn.open-mpi.org/trac/ompi/ticket/2401
2013-10-28 20:04:49 +00:00
Nathan Hjelm
b202bb0d63 Fix the recursive halfing algorithms for reduce scatter in both basic
and tuned to correctly handle 0 recvcounts.

Tested with the reproducer from #1550.

Refs trac:1559

This commit was SVN r29542.

The following Trac tickets were found above:
  Ticket 1559 --> https://svn.open-mpi.org/trac/ompi/ticket/1559
2013-10-28 19:06:38 +00:00
Ralph Castain
01c9973a29 Don't include AM directives in continued lines
This commit was SVN r29537.
2013-10-27 05:44:55 +00:00
Ralph Castain
ed3bbb977e Cleanup wrapper makefile when java bindings not enabled
This commit was SVN r29532.
2013-10-27 04:35:43 +00:00
Ralph Castain
3ec27b00ae Cleanup the Java integration - don't install the mpijavac compiler if the user didn't ask for Java bindings
This commit was SVN r29526.
2013-10-26 16:18:18 +00:00
Rolf vandeVaart
fa5d20a5ec Add optimization that can be used when CUDA 6.0 comes out. Use new pointer attribute.
This commit was SVN r29514.
2013-10-24 21:17:58 +00:00
Rolf vandeVaart
628a109a74 Make casting very clear.
This commit was SVN r29511.
2013-10-24 14:40:01 +00:00
Rolf vandeVaart
5687e4387d Fix compiler warning.
This commit was SVN r29510.
2013-10-24 13:11:12 +00:00
Nathan Hjelm
26f3a029d3 Fix scif configury.
cmr=v1.7.4:ticket=3862

This commit was SVN r29493.

The following Trac tickets were found above:
  Ticket 3862 --> https://svn.open-mpi.org/trac/ompi/ticket/3862
2013-10-23 17:04:20 +00:00
Nathan Hjelm
6186b5ed9d Remove extra file that made its way into r29490.
cmr=v1.7.4:ticket=3862

This commit was SVN r29491.

The following SVN revision numbers were found above:
  r29490 --> open-mpi/ompi@cde3b05ed3

The following Trac tickets were found above:
  Ticket 3862 --> https://svn.open-mpi.org/trac/ompi/ticket/3862
2013-10-23 16:17:51 +00:00
Nathan Hjelm
cde3b05ed3 Add support for the Intel scif interface.
Depends on #3847.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29490.
2013-10-23 15:59:14 +00:00
Dave Goodell
e9dbb66e58 mpool/rdma: fix memory leak at module finalize
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29487.
2013-10-23 15:51:55 +00:00
Dave Goodell
647e5a6fd2 rcache/vma: fix module finalization memory leaks
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29486.
2013-10-23 15:51:44 +00:00
Dave Goodell
d969cfa513 usnic: correctly clean up verbs resources
Due to deallocation ordering (and an entirely missed deallocation), we
were leaking modest amounts of memory inside libusnic_verbs.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29485.
2013-10-23 15:51:33 +00:00
Dave Goodell
a6ed232a10 usnic: fix several memory leaks
- some free lists simply were not being OBJ_DESTRUCTed, so they never
  freed their internal memory

- channel->recv_segs.ctx was being assigned in a way that got clobbered
  by ompi_free_list_init_new, so the cleanup code that relied on it
  being set never ran

- numerous other ".ctx" assignments were similarly ineffectual and were
  not being consumed, so I deleted them

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29484.
2013-10-23 15:51:22 +00:00
Dave Goodell
c9b2343982 usnic: add ompi_btl_usnic_component_debug helper
This new routine can be called in exceptional situations, either
conditionally in BTL code or from a debugger, to help with debugging in
cases where MSGDEBUG1/2 or stats logging are impractical but more detail
is needed.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29483.
2013-10-23 15:51:11 +00:00
Dave Goodell
d0b7d125b2 usnic: refactor usnic_stats_callback
Pull the bulk of the functionality out into a new routine,
ompi_btl_usnic_print_stats, which can be used in other debugging
contexts.  This also lets us eliminate the module->final_stats state
tracking.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29482.
2013-10-23 15:50:57 +00:00
Nathan Hjelm
d34a4300b8 Fix various bugs in mca_base_pvar.
Fixes:

 - Segmentation fault when using watermark variables.

 - Segmentation fault when using a handle bound to a no longer valid
   performance variable.

 - Incorrect return codes from MPI_T_pvar_* functions.

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29481.
2013-10-23 15:47:15 +00:00
Jeff Squyres
0fb8edd720 Trivial comment change
This commit was SVN r29480.
2013-10-23 10:15:18 +00:00
Mike Dubman
d6ead2a3a5 Add support for routable ROCE where different subnet_id is a valid to proceed with MPI routing.
(can happen in the same LAN)
developed by vasily, reviewed by miked
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29479.
2013-10-23 06:08:54 +00:00
Ralph Castain
16b1ad052f Silence compiler warning
This commit was SVN r29478.
2013-10-23 04:13:51 +00:00
Nathan Hjelm
5bf6555604 Fix locality when in the case where the OMPI_RTE_HOST_ID is not found.
cmr=v1.7.4:ticket=3847

This commit was SVN r29475.

The following Trac tickets were found above:
  Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847
2013-10-22 19:07:03 +00:00
Rolf vandeVaart
3c916d55c9 Fix two issues pointed out in review of ticket #3870.
This commit was SVN r29473.
2013-10-22 17:28:12 +00:00
Mike Dubman
d27cffedb9 expand tabs to 4 spaces
cd ompi/mca/coll/fca
for i in *.[ch]; do expand -t 4 $i > koko && mv koko $i; done
Refs: #3799

This commit was SVN r29472.
2013-10-22 17:05:55 +00:00
Jeff Squyres
6714890244 paffinity.h is gone and won't be coming back.
This commit was SVN r29467.
2013-10-22 15:59:00 +00:00
Nathan Hjelm
7bee047e5d Fix rentry check in communicator request progress.
cmr=v1.7.4:ticket=3796

This commit was SVN r29465.

The following Trac tickets were found above:
  Ticket 3796 --> https://svn.open-mpi.org/trac/ompi/ticket/3796
2013-10-22 15:33:39 +00:00
Nathan Hjelm
280a89448f Make btl/vader valgrind safe.
cmr=v1.7.4:reviewer=samuel

This commit was SVN r29464.
2013-10-22 15:33:32 +00:00
Jeff Squyres
506d0e96f4 Fix the IN_PLACE detection for Fortran SCATTER and SCATTERV.
Thanks to Charles Gerlach for identifying the issue.

Oddly, this issue exists in trunk and v1.7, but ''not'' in the v1.6
tree (!).

cmr=v1.7.4:reviewer=hjelmn

This commit was SVN r29463.
2013-10-22 14:55:17 +00:00
Jeff Squyres
09fae6e62b Prefix DSO filenames with "lib" so that Automake doesn't complain.
Follow the convention established by the ompi/mca/common/sm tree and
prefix both the "install" and "no install" versions of the build with
"lib" so that Automake doesn't complain.  Differentiate the two by
adding a "_noinst" suffix to the "no install" version.

This commit was SVN r29462.
2013-10-22 13:16:33 +00:00
Matthias Jurenz
582d5337ce Changes to VT's configure:
- removed potential double-'/' in CUPTIDIR which makes trouble with rpmbuild's debugedit program (fixes trac:3854)

This commit was SVN r29461.

The following Trac tickets were found above:
  Ticket 3854 --> https://svn.open-mpi.org/trac/ompi/ticket/3854
2013-10-22 09:02:11 +00:00
Rolf vandeVaart
0cd1e8dfd9 Add runtime support to turn off CUDA IPC support.
This commit was SVN r29444.
2013-10-16 16:48:18 +00:00
Rolf vandeVaart
9f83405c78 Fix one more corner case initialization issue.
This commit was SVN r29443.
2013-10-16 16:39:19 +00:00
Ralph Castain
bd0b13221b Cleanup ompi_info change to silence compiler warning
This commit was SVN r29436.
2013-10-14 16:57:50 +00:00
Ralph Castain
24c811805f ****************************************************************
This change contains a non-mandatory modification
       of the MPI-RTE interface. Anyone wishing to support
       coprocessors such as the Xeon Phi may wish to add
       the required definition and underlying support
****************************************************************

Add locality support for coprocessors such as the Intel Xeon Phi.

Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host.

So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following:

1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board

2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions

3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future.

4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time.

5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored.

6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set.

cmr:v1.7.4:reviewer=hjelmn

This commit was SVN r29435.
2013-10-14 16:52:58 +00:00
Mike Dubman
2141e9e6b4 tools: Add oshmem_info utility
Reworked ompi_info tool to be close with orte_info implementation.
ompi_info_register_types(), ompi_info_close_components() and
ompi_info_show_ompi_version() are moved to runtime/ompi_info_support.c.

Added runtime/oshmem_info_support layer that exports following api to be
used into oshmem_info tool as
oshmem_info_register_types()
oshmem_info_register_framework_params()
oshmem_info_close_components()
oshmem_info_show_oshmem_version()
These functions call ompi_info_support related interfaces as long as
Oshmem supports Open MPI/SHMEM combination.

Now orte_info/ompi_info/oshmem_info have identical implementation approach.

Possible improvement:
OSHMEM processing of --config option is the same as OMPI`s (code is duplicated).
Probably list of info_support interfaces can be extended by xxx_info_do_config().
developed by Igor, reviewed by miked

This commit was SVN r29429.
2013-10-12 19:03:32 +00:00
Mike Dubman
5a7dff2d15 fix icc warning
fixed by Dinar, reviewed by miked
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29428.
2013-10-12 18:04:28 +00:00
Jeff Squyres
eabbcd1157 Fix a pair of minor errors in the affinity MPI extension.
* Use the right length for memset/strncpy
* Return the set return value (vs. unconditionally returning
  OMPI_SUCCESS)

cmr=v1.7.4:reviewer=dgoodell:subject=Fix a pair of minor errors in the affinity MPI extension

This commit was SVN r29427.
2013-10-11 21:17:38 +00:00
Jeff Squyres
b5e2ae86ad Remove all of our "to-do" items from the README.txt.
This commit was SVN r29424.
2013-10-11 16:43:56 +00:00
Rolf vandeVaart
fbf143f3b4 Move another function that was missed in r29347.
This commit was SVN r29422.

The following SVN revision numbers were found above:
  r29347 --> open-mpi/ompi@ce61985503
2013-10-10 14:48:56 +00:00
Jeff Squyres
d9be19f011 Added shared library versions to those who were missing it.
The following common shared libraries did not have versioning:

 * ompi/common/ofacm
 * ompi/common/verbs
 * ompi/common/ugni

Additionally, we still had shared library versions in VERSION for the
following libraries, which no longer exist:

 * ompi/common/portals
 * opal/common/hwloc

This commit was SVN r29421.
2013-10-10 13:25:57 +00:00
Nathan Hjelm
e11233cb65 Remove unnecessary member from the comm idup context.
Refs trac:3796

This commit was SVN r29419.

The following Trac tickets were found above:
  Ticket 3796 --> https://svn.open-mpi.org/trac/ompi/ticket/3796
2013-10-09 22:00:17 +00:00
Jeff Squyres
2d7d7ab731 Absoft caught that bind(c) functions can't have OPTIONAL dummy arguments.
Also, removed an MPI_Aint snuck through (which is a C type, not a Fortran
type).  Oddly, the Intel compiler complained about neither of these
issues.  :-\

This commit was SVN r29411.
2013-10-09 14:29:34 +00:00
Ralph Castain
9902748108 ***** THIS INCLUDES A SMALL CHANGE IN THE MPI-RTE INTERFACE *****
Fix two problems that surfaced when using direct launch under SLURM:

1. locally store our own data because some BTLs want to retrieve 
   it during add_procs rather than use what they have internally

2. cleanup MPI_Abort so it correctly passes the error status all
   the way down to the actual exit. When someone implemented the
   "abort_peers" API, they left out the error status. So we lost
   it at that point and *always* exited with a status of 1. This 
   forces a change to the API to include the status.

cmr:v1.7.3:reviewer=jsquyres:subject=Fix MPI_Abort and modex_recv for direct launch

This commit was SVN r29405.
2013-10-08 18:37:59 +00:00
Jeff Squyres
0d093176e1 Add missing .so version numbers for libmpi_usempif08.so.
cmr=v1.7.3:reviewer=rhc:subject=Add missing libmpi_usempif08 shared lib

This commit was SVN r29401.
2013-10-08 16:20:12 +00:00
Jeff Squyres
66dadbe1e7 Per RFC, remove the udapl BTL.
This commit was SVN r29400.
2013-10-08 15:18:59 +00:00
Jeff Squyres
a3606c4029 Add weak symbol #defines.
Fixes compile errors if you compile --without-weak-symbols.

This commit was SVN r29396.
2013-10-08 14:54:40 +00:00
Jeff Squyres
ad4abdbf53 Avoid using /* in the comment (which starts a C comment)
Since this header is included in .F90 files (which are preprocessed,
vs. .f90 files, which are *not* preprocessed), we don't want to
accidentally start C-style comments, which are recognized by some
Fortran compiler preprocessors (e.g., Absoft).

This commit was SVN r29394.
2013-10-08 14:22:55 +00:00
Jeff Squyres
ceecc04bfe Some Fortran compilers warn about ! comments after #endif.
So just move the comment to the prior line, and it's all good.  This
is obviosuly not *necessary*, but it helps cut down on warning noise.

This commit was SVN r29393.
2013-10-08 13:47:44 +00:00
Jeff Squyres
9f9fb5ce38 Add weak symbols for the MPI-3.1 MPI_Alloc_mem_cptr fortran subroutine.
This subroutine was added in an errata to MPI-3.0; here's the MPI
Forum ticket about it:
    
    https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/390

This commit was SVN r29385.
2013-10-04 22:43:40 +00:00