1
1

18839 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
6569019b06 Move all usNIC stats to _stats.c|h and export them as MPI_T pvars.
This commit moves all the module stats into their own struct so that
the stats only need to appear as a single line in the module_t
definition, and then moves all the logic for reporting the stats into
btl_usnic_stats.c|h.

Further, the stats are now exported as MPI_T_BIND_NO_OBJECT entities
(i.e., not bound to any particular MPI handle), and are marked as
READONLY and CONTINUOUS.  They currently all default to verbose level
5 ("Application tuner / detailed", according to
https://svn.open-mpi.org/trac/ompi/wiki/MCAParamLevels).

Most of the statistics are counters, but a small number are high
watermark values.  Due to how counters are reported via MPI_T, none of
the counters are exported through MPI_T if the MCA param
btl_usnic_stats_relative=1 (i.e., the module resets the stats back to
zero at a given frequency).

When MPI_T_pvar_handle_alloc() is invoked on any of these pvars, it
will return a count that is equal to the number of active usnic BTL
modules.  The values returned for any given pvar (e.g.,
num_total_sends) are an array containing one value for each active
usnic BTL module.  The ordering of values in the array is both
consistent across all usnic pvars and stable throughout a single job:
array slot 0 corresponds to module X, array slot 1 corresponds to
module Y, etc.

Mapping which array slot corresponds to which underlying Linux usnic_X
device works as follows:

 * The btl_usnic_devices MPI_T state pvar is associated with a
   btl_usnic_device MPI_T enum, and be obtained via
   MPI_T_pvar_get_info().
 * If all usNIC pvars are of length N, the values [0,N) in the
   btl_usnic_device enum are associated with strings of the
   corresponding underlying Linux device.

For exampe, to look up which Linux device is reported in all usNIC
pvars' array slot 1, look up the int value 1 in the btl_usnic_devices
enum.  Its corresponding string value is underlying Linux device name
(e.g., "usnic_1").

cmr=v1.7.4:subject="usnic BTL MPI_T pvars"

This commit was SVN r29545.
2013-10-28 22:23:08 +00:00
Nathan Hjelm
404cceb9c4 Always check the return of [mc]alloc and fix a warning introduced by
r29479.

This fixes some issues reported awhile ago in the openib btl. There
are a couple more unchecked mallocs but they are a bit more difficult
to fix since they are in void functions (btl_openib_endpoint.c).

Refs trac:2401.

cmr=v1.7.4:reviewer=miked

This commit was SVN r29543.

The following SVN revision numbers were found above:
  r29479 --> open-mpi/ompi@d6ead2a3a5

The following Trac tickets were found above:
  Ticket 2401 --> https://svn.open-mpi.org/trac/ompi/ticket/2401
2013-10-28 20:04:49 +00:00
Nathan Hjelm
b202bb0d63 Fix the recursive halfing algorithms for reduce scatter in both basic
and tuned to correctly handle 0 recvcounts.

Tested with the reproducer from #1550.

Refs trac:1559

This commit was SVN r29542.

The following Trac tickets were found above:
  Ticket 1559 --> https://svn.open-mpi.org/trac/ompi/ticket/1559
2013-10-28 19:06:38 +00:00
Mike Dubman
b5c95e8eb6 yoda spml will disqalify itself if bml/btls are not started by ompi
Starting bml/btls in yoda is pointless because btls require modex()
exchange. modex() is only done during mpi_init()

Refs trac:3763

This commit was SVN r29541.

The following Trac tickets were found above:
  Ticket 3763 --> https://svn.open-mpi.org/trac/ompi/ticket/3763
2013-10-28 18:55:50 +00:00
Nathan Hjelm
68dac45a37 Add platform files for LANL MIC nodes.
No review needed.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29540.
2013-10-28 16:46:04 +00:00
Ralph Castain
fb0940a9d9 Add a couple of useful tests
This commit was SVN r29539.
2013-10-28 13:24:16 +00:00
Ralph Castain
74c6b12d67 Don't force "treat warnings as errors" in the OpenSHMEM branch as this prevents building of tarballs on machines with compilers that generate warnings when the system is built with "tarball" settings. Instead, make this a configuration option so that OSHMEM developers can set it when they work on the code, but others don't have to use such restrictions.
Refs trac:3763

This commit was SVN r29538.

The following Trac tickets were found above:
  Ticket 3763 --> https://svn.open-mpi.org/trac/ompi/ticket/3763
2013-10-27 15:35:14 +00:00
Ralph Castain
01c9973a29 Don't include AM directives in continued lines
This commit was SVN r29537.
2013-10-27 05:44:55 +00:00
Ralph Castain
ed3bbb977e Cleanup wrapper makefile when java bindings not enabled
This commit was SVN r29532.
2013-10-27 04:35:43 +00:00
Ralph Castain
588e7ce974 Cleanup the Java setup m4's - orte doesn't require java, and we do all compiler checks in opal
This commit was SVN r29530.
2013-10-26 19:43:32 +00:00
Ralph Castain
25385590e6 Silence warning
This commit was SVN r29528.
2013-10-26 19:41:35 +00:00
Ralph Castain
3ec27b00ae Cleanup the Java integration - don't install the mpijavac compiler if the user didn't ask for Java bindings
This commit was SVN r29526.
2013-10-26 16:18:18 +00:00
Ralph Castain
75c306994e Add some debug
This commit was SVN r29523.
2013-10-26 02:26:21 +00:00
Ralph Castain
8c5c7d0db4 Correct a bug in handling of oob_tcp_if_include/exclude addresses by using the kernel index instead of the raw index of the interface.
Refs trac:3696

This commit was SVN r29522.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-10-26 00:47:14 +00:00
Joshua Ladd
2f22329ea4 This commit (hopefully) fully addresses the enabling of oshmem fortran bindings given its dependency on the building of fortran bindings in OMPI. This commit closes trac:3851.
This commit was SVN r29521.

The following Trac tickets were found above:
  Ticket 3851 --> https://svn.open-mpi.org/trac/ompi/ticket/3851
2013-10-25 15:42:10 +00:00
Joshua Ladd
6b4bfcf4d7 This commit fixes pointer arithmetic done with void * pointers in memheap. This commit closes trac:3844
This commit was SVN r29520.

The following Trac tickets were found above:
  Ticket 3844 --> https://svn.open-mpi.org/trac/ompi/ticket/3844
2013-10-25 15:27:50 +00:00
Mike Dubman
9cea216c0e fix autogen warn, fixed by Roman, reviewed by miked
This commit was SVN r29519.
2013-10-25 14:39:37 +00:00
Mike Dubman
5cc1f3803b shmem-fortran fix, naming conventions for shmem option, examples
This commit was SVN r29518.
2013-10-25 05:25:41 +00:00
Nathan Hjelm
a4ae1705dd completion: only call sort when using comm to remove items from a list
This commit was SVN r29517.
2013-10-24 23:31:43 +00:00
Nathan Hjelm
c8844d1514 Remove some code left over from debugging the completion script
This commit was SVN r29516.
2013-10-24 23:03:54 +00:00
Nathan Hjelm
ac23c02bc2 Some cleanup of the bash completion code
This commit was SVN r29515.
2013-10-24 22:47:13 +00:00
Rolf vandeVaart
fa5d20a5ec Add optimization that can be used when CUDA 6.0 comes out. Use new pointer attribute.
This commit was SVN r29514.
2013-10-24 21:17:58 +00:00
Nathan Hjelm
d152ebff9e Add a bash completion script for orterun/mpirun.
Features of v 1.0:

 - Completion of all switches.
 - Completion of mca variable names.
 - Completion of mca variable values for enumerated variables and component
   selection variables.
 - Completion of --bind-to and --map-by.

This commit was SVN r29513.
2013-10-24 19:35:38 +00:00
Nathan Hjelm
f7428fb6a9 Small fixes for the MCA variable interface.
- Make a copy of enumerator data for default enumerators. This will allow
   the caller to free their data once the enumerator has been created. This
   is a change from just referencing the values array.

 - Make mca_base_pvar_notify check if the pvar is valid before calling the
   notify callback. This fixes a segmentation fault when destroying handles
   after MPI_Finalize().

cmr=v1.7.4:ticket=trac:3861

This commit was SVN r29512.

The following Trac tickets were found above:
  Ticket 3861 --> https://svn.open-mpi.org/trac/ompi/ticket/3861
2013-10-24 19:27:06 +00:00
Rolf vandeVaart
628a109a74 Make casting very clear.
This commit was SVN r29511.
2013-10-24 14:40:01 +00:00
Rolf vandeVaart
5687e4387d Fix compiler warning.
This commit was SVN r29510.
2013-10-24 13:11:12 +00:00
Ralph Castain
604970a1a2 Initialize orte_coprocessors hash table to NULL. Delay coprocessor detection on HNP until after node topology final definition in case rmaps changes it. Minor spacing change.
Refs trac:3847

This commit was SVN r29504.

The following Trac tickets were found above:
  Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847
2013-10-24 00:08:47 +00:00
Ralph Castain
f2c9df9056 Add ignores
Refs trac:3862

This commit was SVN r29503.

The following Trac tickets were found above:
  Ticket 3862 --> https://svn.open-mpi.org/trac/ompi/ticket/3862
2013-10-24 00:06:57 +00:00
Ralph Castain
f5920e9312 Revert r29489. This function only executes in the HNP. In orte/mca/ess/hnp/ess_hnp_module.c, we already check for local coprocessors and add them to the hash table if found. Thus, r29489 simply overwrote what was already present.
The data for each remote daemon is added later in the daemon callback function. Only the HNP retains info in the hash table.

If it is desirable to have each daemon retain its own coprocessor info, then this must be done in orte/mca/ess/base/ess_base_std_orted.c.

This commit was SVN r29497.

The following SVN revision numbers were found above:
  r29489 --> open-mpi/ompi@2e2794fa15
2013-10-23 22:35:24 +00:00
Jeff Squyres
f45144aed0 Add a little more to the docs for mca_base_var_enum_create().
This commit was SVN r29496.
2013-10-23 22:11:19 +00:00
Jeff Squyres
c982a6acb0 Bill D'Amico contributed for Cisco, so we can list him here.
This commit was SVN r29495.
2013-10-23 20:52:47 +00:00
Dave Goodell
d72c272796 add a .mailmap file for the github mirror
This file exists to help map usernames to proper names and email
addresses in the Open MPI github mirror of the canonical SVN repository.
The github mirror can be found here:

  https://github.com/open-mpi/ompi-svn-mirror

I've seeded the file with the names of Cisco contributors.  In order to
avoid exposing anyone's email address without their permission, we are
using an opt-in model for adding real email addresses.

This commit was SVN r29494.
2013-10-23 20:39:10 +00:00
Nathan Hjelm
26f3a029d3 Fix scif configury.
cmr=v1.7.4:ticket=3862

This commit was SVN r29493.

The following Trac tickets were found above:
  Ticket 3862 --> https://svn.open-mpi.org/trac/ompi/ticket/3862
2013-10-23 17:04:20 +00:00
Dave Goodell
620f40b6c7 fix compile error introduced by r29488
Apologies for the breakage, I did my test build in the wrong window...

No reviewer.

cmr=v1.7.4:ticket=3865

This commit was SVN r29492.

The following SVN revision numbers were found above:
  r29488 --> open-mpi/ompi@25dd719d4d

The following Trac tickets were found above:
  Ticket 3865 --> https://svn.open-mpi.org/trac/ompi/ticket/3865
2013-10-23 16:36:52 +00:00
Nathan Hjelm
6186b5ed9d Remove extra file that made its way into r29490.
cmr=v1.7.4:ticket=3862

This commit was SVN r29491.

The following SVN revision numbers were found above:
  r29490 --> open-mpi/ompi@cde3b05ed3

The following Trac tickets were found above:
  Ticket 3862 --> https://svn.open-mpi.org/trac/ompi/ticket/3862
2013-10-23 16:17:51 +00:00
Nathan Hjelm
cde3b05ed3 Add support for the Intel scif interface.
Depends on #3847.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29490.
2013-10-23 15:59:14 +00:00
Nathan Hjelm
2e2794fa15 Fix coprocessor detection by always adding the local daemon's co-processors
to the hash table.

Tested and working on a system with 2 Xeon Phi co-processors.

cmr=v1.7.4:ticket=3847:reviewer=ompi-rm1.7

This commit was SVN r29489.

The following Trac tickets were found above:
  Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847
2013-10-23 15:56:23 +00:00
Dave Goodell
25dd719d4d opal: support __attribute__((__noinline__))
First cut does not attempt any "cross-check".  As we discover compilers
which complain about __noinline__, we will add specific cross checks to
handle those cases.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29488.
2013-10-23 15:52:05 +00:00
Dave Goodell
e9dbb66e58 mpool/rdma: fix memory leak at module finalize
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29487.
2013-10-23 15:51:55 +00:00
Dave Goodell
647e5a6fd2 rcache/vma: fix module finalization memory leaks
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29486.
2013-10-23 15:51:44 +00:00
Dave Goodell
d969cfa513 usnic: correctly clean up verbs resources
Due to deallocation ordering (and an entirely missed deallocation), we
were leaking modest amounts of memory inside libusnic_verbs.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29485.
2013-10-23 15:51:33 +00:00
Dave Goodell
a6ed232a10 usnic: fix several memory leaks
- some free lists simply were not being OBJ_DESTRUCTed, so they never
  freed their internal memory

- channel->recv_segs.ctx was being assigned in a way that got clobbered
  by ompi_free_list_init_new, so the cleanup code that relied on it
  being set never ran

- numerous other ".ctx" assignments were similarly ineffectual and were
  not being consumed, so I deleted them

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29484.
2013-10-23 15:51:22 +00:00
Dave Goodell
c9b2343982 usnic: add ompi_btl_usnic_component_debug helper
This new routine can be called in exceptional situations, either
conditionally in BTL code or from a debugger, to help with debugging in
cases where MSGDEBUG1/2 or stats logging are impractical but more detail
is needed.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29483.
2013-10-23 15:51:11 +00:00
Dave Goodell
d0b7d125b2 usnic: refactor usnic_stats_callback
Pull the bulk of the functionality out into a new routine,
ompi_btl_usnic_print_stats, which can be used in other debugging
contexts.  This also lets us eliminate the module->final_stats state
tracking.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29482.
2013-10-23 15:50:57 +00:00
Nathan Hjelm
d34a4300b8 Fix various bugs in mca_base_pvar.
Fixes:

 - Segmentation fault when using watermark variables.

 - Segmentation fault when using a handle bound to a no longer valid
   performance variable.

 - Incorrect return codes from MPI_T_pvar_* functions.

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29481.
2013-10-23 15:47:15 +00:00
Jeff Squyres
0fb8edd720 Trivial comment change
This commit was SVN r29480.
2013-10-23 10:15:18 +00:00
Mike Dubman
d6ead2a3a5 Add support for routable ROCE where different subnet_id is a valid to proceed with MPI routing.
(can happen in the same LAN)
developed by vasily, reviewed by miked
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29479.
2013-10-23 06:08:54 +00:00
Ralph Castain
16b1ad052f Silence compiler warning
This commit was SVN r29478.
2013-10-23 04:13:51 +00:00
Ralph Castain
7c86a843c8 Silence compiler warning
This commit was SVN r29477.
2013-10-23 04:13:36 +00:00
Ralph Castain
960a255e7f Do some cleanup of the --without-hwloc build - no need to work on coprocessors since we can't detect them anyway, cleanup some unused variables in the ppr mapper
This commit was SVN r29476.
2013-10-23 01:45:21 +00:00