1
1
Граф коммитов

4441 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
5a940f5ee7 Arrgh -- remove debugging printf.
This commit was SVN r29657.
2013-11-11 22:44:28 +00:00
Jeff Squyres
e20217eccc Expand the "btl_usnic" MPI_T enumeration to have strings of the form:
<usnic device name>,<eth device>,<ip address>/<CIDR prefix>

For example:

   usnic_0,eth4,10.1.0.15/16

This is just handy for mapping the usnic_X device back to the IP
network to which it corresponds.

This commit was SVN r29656.
2013-11-11 22:25:30 +00:00
Nathan Hjelm
6a331275d8 Set transfers as active before starting them.
cmr=v1.7.4:ticket=trac:3898

This commit was SVN r29654.

The following Trac tickets were found above:
  Ticket 3898 --> https://svn.open-mpi.org/trac/ompi/ticket/3898
2013-11-11 21:50:54 +00:00
Nathan Hjelm
3d3c29ae96 btl/scif: do not return resource busy if we started a connection attempt.
Resolves a hang when using scif for shared memory transfers. This is a
simple change and doesn't require a review.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29653.
2013-11-11 19:36:34 +00:00
Nathan Hjelm
b5ce72cc15 Set the modex as active before starting it. This resolves a hang in
MPI_Init() on comm-spawned processes.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29652.
2013-11-11 19:33:32 +00:00
Rolf vandeVaart
a6df7bc33a Fix issues reported in ticket #3877. Also added additional comments.
This commit was SVN r29641.
2013-11-07 20:44:47 +00:00
Rolf vandeVaart
2cf7c40ee5 Minor adjustments to error messages due to review of #3880.
This commit was SVN r29640.
2013-11-07 20:21:21 +00:00
Rolf vandeVaart
3290cde630 Various minor changes to bring smcuda up to date with sm.
This commit was SVN r29639.
2013-11-07 19:45:56 +00:00
Dave Goodell
82db913490 usnic: fix module_recv_buffers perf regression
Cisco v1.6 git commit 913ec6c and upstream trunk r29593 (segfault fix)
introduced a performance regression by inadvertently disabling the
`module_recv_buffers` functionality.  With those changes in place, the
`btl_usnic_recv.c` logic would end up mallocing a buffer that should
have otherwise come from a `module_recv_buffers` pool.  It also resulted
in a small, bounded memory leak (128 buffers at each power-of-two size
interval).

The new version just places the buffer after the free list item with a
flexible array member.  I bumped the pool to allocate all 128 elements
up front because the deferred allocation was modestly impacting IMB
Sendrecv performance at a few sizes.

Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29631.

The following SVN revision numbers were found above:
  r29593 --> open-mpi/ompi@1ed9b8ff43
2013-11-07 01:27:31 +00:00
Vishwanath Venkatesan
d37a5faa20 Need not do aggregator selection for one process case
So adding a check for this corner case!

This commit was SVN r29622.
2013-11-06 21:05:26 +00:00
Brian Barrett
cf8de1ef0f Minor indent cleanup in init_query()
Only use Portals on communicators with more than one rank
Fix computation of number of children when using the hypercube tree

This commit was SVN r29616.
2013-11-06 15:21:09 +00:00
Jeff Squyres
e28261898d Per discussion on the devel list, rename the btl_usnic_devices MPI_T
state pvar to be btl_usnic (i.e., the best suggestion so far).

See http://www.open-mpi.org/community/lists/devel/2013/11/13188.php
for more detail.

This commit was SVN r29614.
2013-11-06 06:19:03 +00:00
Rolf vandeVaart
e46c0bb952 Fix one more space for consistent defines.
This commit was SVN r29607.
2013-11-05 15:31:49 +00:00
Rolf vandeVaart
64b3a24fec Fix CUDA-aware compile issues.
This commit was SVN r29606.
2013-11-05 14:46:58 +00:00
Rolf vandeVaart
e57795f097 Revert r29594. That was just plain wrong. Sorry about workday configure change.
This commit was SVN r29605.

The following SVN revision numbers were found above:
  r29594 --> open-mpi/ompi@ed7ddcd9c7
2013-11-05 14:45:56 +00:00
Rolf vandeVaart
ed7ddcd9c7 Fix CUDA-aware compile error introduces with r29581.
This commit was SVN r29594.

The following SVN revision numbers were found above:
  r29581 --> open-mpi/ompi@ee7510b025
2013-11-05 00:08:33 +00:00
Dave Goodell
1ed9b8ff43 usnic: fix segfault at finalize time
Without this commit, if you run IMB pingpong between two nodes with only
one usnic selected (e.g., via `--mca btl_usnic_if_include usnic_0`) then
the run will seem fine but will segfault at MPI_Finalize time.

This behavior has happened since Cisco v1.6 git commit ec7ddf8, upstream
trunk r29484, and upstream v1.7 r29507.

Root cause was that the free list element was being used as the recv
buffer instead of the data buffer associated with the element.  So the
reassembly code would stomp all over the free list element, which would
cause the destructor to explode when the free list attempted to clean up
all of its elements.  This surprisingly did not cause any other problems
until now.

Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29593.

The following SVN revision numbers were found above:
  r29484 --> open-mpi/ompi@a6ed232a10
  r29507 --> open-mpi/ompi@790d269ce8
2013-11-04 22:52:14 +00:00
Dave Goodell
73a943492c usnic: pack via convertor on the fly
If we need to use a convertor, go back to stashing that convertor in the
frag and populating segments "on the fly" (in
ompi_btl_usnic_module_progress_sends).  Previously we would pack into a
chain of chunk segments at prepare_src time, unnecessarily consuming
additional memory.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29592.
2013-11-04 22:52:03 +00:00
Dave Goodell
71d0d73575 usnic: refactor callback invocation
This makes it a little easier to see what's happening with callbacks to
the PML.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29591.
2013-11-04 22:51:48 +00:00
Dave Goodell
4c791e21d2 usnic: add MSGDEBUG1_OUT/MSGDEBUG2_OUT macros
This includes suppressing picky-mode warnings about __VA_ARGS__, which
we know are supported by any compilers we care about.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29590.
2013-11-04 22:51:35 +00:00
Dave Goodell
825686a205 usnic: certain send frag members are immutable
Ensure that they never are touched by checking in their destructors.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29589.
2013-11-04 22:51:24 +00:00
Nathan Hjelm
c71125acfd Using MPI_* functions in iallreduce can cause comm-spawned processes to
crash. Update libnbc's iallreduce function to use ompi_* functions
instead.

cmr=v1.7.4:reviewer=brbarret

This commit was SVN r29582.
2013-11-01 16:45:54 +00:00
Rolf vandeVaart
ee7510b025 Remove redundant macro. This was from reviewed of earlier ticket.
Fixes trac:3878.  Reviewed by jsquyres.

This commit was SVN r29581.

The following Trac tickets were found above:
  Ticket 3878 --> https://svn.open-mpi.org/trac/ompi/ticket/3878
2013-11-01 12:19:40 +00:00
Rolf vandeVaart
99f9fdee01 Fix corner case involving threads and CUDA-aware support.
This commit was SVN r29579.
2013-10-31 20:53:46 +00:00
Nathan Hjelm
fd25b7af01 Fix common ugni Makefile.am for non-DSO builds.
This commit was SVN r29571.
2013-10-30 19:37:14 +00:00
Nathan Hjelm
a31e617d17 Remove outdated comments in coll_basic_reduce_scatter.c.
Refs trac:1559

This commit was SVN r29566.

The following Trac tickets were found above:
  Ticket 1559 --> https://svn.open-mpi.org/trac/ompi/ticket/1559
2013-10-30 16:20:20 +00:00
Mike Dubman
b0e64427a9 ompi/mca/btl/openib: Fix memory leak and accessing free'd memory issues
Let imagine that we have two btls in btl_openib_component_init() both points to the same openib_btl->device and as a result have the same openib_btl->device->endpoints array.

Finalization phase calls twice mca_btl_openib_finalize()->mca_btl_openib_finalize_resources().
mca_btl_openib_finalize_resources() frees endpoint related btl. But the second call of mca_btl_openib_finalize_resources() checks endpoint that is released by previus call.

fixed by Igor, reviewed by miked/vasily
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29563.
2013-10-30 11:47:49 +00:00
Nathan Hjelm
167d5613db Do not do arithmetic with void * in basic neighborhood alltoall[vw].
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29558.
2013-10-29 20:02:13 +00:00
Jeff Squyres
6569019b06 Move all usNIC stats to _stats.c|h and export them as MPI_T pvars.
This commit moves all the module stats into their own struct so that
the stats only need to appear as a single line in the module_t
definition, and then moves all the logic for reporting the stats into
btl_usnic_stats.c|h.

Further, the stats are now exported as MPI_T_BIND_NO_OBJECT entities
(i.e., not bound to any particular MPI handle), and are marked as
READONLY and CONTINUOUS.  They currently all default to verbose level
5 ("Application tuner / detailed", according to
https://svn.open-mpi.org/trac/ompi/wiki/MCAParamLevels).

Most of the statistics are counters, but a small number are high
watermark values.  Due to how counters are reported via MPI_T, none of
the counters are exported through MPI_T if the MCA param
btl_usnic_stats_relative=1 (i.e., the module resets the stats back to
zero at a given frequency).

When MPI_T_pvar_handle_alloc() is invoked on any of these pvars, it
will return a count that is equal to the number of active usnic BTL
modules.  The values returned for any given pvar (e.g.,
num_total_sends) are an array containing one value for each active
usnic BTL module.  The ordering of values in the array is both
consistent across all usnic pvars and stable throughout a single job:
array slot 0 corresponds to module X, array slot 1 corresponds to
module Y, etc.

Mapping which array slot corresponds to which underlying Linux usnic_X
device works as follows:

 * The btl_usnic_devices MPI_T state pvar is associated with a
   btl_usnic_device MPI_T enum, and be obtained via
   MPI_T_pvar_get_info().
 * If all usNIC pvars are of length N, the values [0,N) in the
   btl_usnic_device enum are associated with strings of the
   corresponding underlying Linux device.

For exampe, to look up which Linux device is reported in all usNIC
pvars' array slot 1, look up the int value 1 in the btl_usnic_devices
enum.  Its corresponding string value is underlying Linux device name
(e.g., "usnic_1").

cmr=v1.7.4:subject="usnic BTL MPI_T pvars"

This commit was SVN r29545.
2013-10-28 22:23:08 +00:00
Nathan Hjelm
404cceb9c4 Always check the return of [mc]alloc and fix a warning introduced by
r29479.

This fixes some issues reported awhile ago in the openib btl. There
are a couple more unchecked mallocs but they are a bit more difficult
to fix since they are in void functions (btl_openib_endpoint.c).

Refs trac:2401.

cmr=v1.7.4:reviewer=miked

This commit was SVN r29543.

The following SVN revision numbers were found above:
  r29479 --> open-mpi/ompi@d6ead2a3a5

The following Trac tickets were found above:
  Ticket 2401 --> https://svn.open-mpi.org/trac/ompi/ticket/2401
2013-10-28 20:04:49 +00:00
Nathan Hjelm
b202bb0d63 Fix the recursive halfing algorithms for reduce scatter in both basic
and tuned to correctly handle 0 recvcounts.

Tested with the reproducer from #1550.

Refs trac:1559

This commit was SVN r29542.

The following Trac tickets were found above:
  Ticket 1559 --> https://svn.open-mpi.org/trac/ompi/ticket/1559
2013-10-28 19:06:38 +00:00
Rolf vandeVaart
fa5d20a5ec Add optimization that can be used when CUDA 6.0 comes out. Use new pointer attribute.
This commit was SVN r29514.
2013-10-24 21:17:58 +00:00
Rolf vandeVaart
628a109a74 Make casting very clear.
This commit was SVN r29511.
2013-10-24 14:40:01 +00:00
Rolf vandeVaart
5687e4387d Fix compiler warning.
This commit was SVN r29510.
2013-10-24 13:11:12 +00:00
Nathan Hjelm
26f3a029d3 Fix scif configury.
cmr=v1.7.4:ticket=3862

This commit was SVN r29493.

The following Trac tickets were found above:
  Ticket 3862 --> https://svn.open-mpi.org/trac/ompi/ticket/3862
2013-10-23 17:04:20 +00:00
Nathan Hjelm
6186b5ed9d Remove extra file that made its way into r29490.
cmr=v1.7.4:ticket=3862

This commit was SVN r29491.

The following SVN revision numbers were found above:
  r29490 --> open-mpi/ompi@cde3b05ed3

The following Trac tickets were found above:
  Ticket 3862 --> https://svn.open-mpi.org/trac/ompi/ticket/3862
2013-10-23 16:17:51 +00:00
Nathan Hjelm
cde3b05ed3 Add support for the Intel scif interface.
Depends on #3847.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29490.
2013-10-23 15:59:14 +00:00
Dave Goodell
e9dbb66e58 mpool/rdma: fix memory leak at module finalize
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29487.
2013-10-23 15:51:55 +00:00
Dave Goodell
647e5a6fd2 rcache/vma: fix module finalization memory leaks
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29486.
2013-10-23 15:51:44 +00:00
Dave Goodell
d969cfa513 usnic: correctly clean up verbs resources
Due to deallocation ordering (and an entirely missed deallocation), we
were leaking modest amounts of memory inside libusnic_verbs.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29485.
2013-10-23 15:51:33 +00:00
Dave Goodell
a6ed232a10 usnic: fix several memory leaks
- some free lists simply were not being OBJ_DESTRUCTed, so they never
  freed their internal memory

- channel->recv_segs.ctx was being assigned in a way that got clobbered
  by ompi_free_list_init_new, so the cleanup code that relied on it
  being set never ran

- numerous other ".ctx" assignments were similarly ineffectual and were
  not being consumed, so I deleted them

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29484.
2013-10-23 15:51:22 +00:00
Dave Goodell
c9b2343982 usnic: add ompi_btl_usnic_component_debug helper
This new routine can be called in exceptional situations, either
conditionally in BTL code or from a debugger, to help with debugging in
cases where MSGDEBUG1/2 or stats logging are impractical but more detail
is needed.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29483.
2013-10-23 15:51:11 +00:00
Dave Goodell
d0b7d125b2 usnic: refactor usnic_stats_callback
Pull the bulk of the functionality out into a new routine,
ompi_btl_usnic_print_stats, which can be used in other debugging
contexts.  This also lets us eliminate the module->final_stats state
tracking.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29482.
2013-10-23 15:50:57 +00:00
Jeff Squyres
0fb8edd720 Trivial comment change
This commit was SVN r29480.
2013-10-23 10:15:18 +00:00
Mike Dubman
d6ead2a3a5 Add support for routable ROCE where different subnet_id is a valid to proceed with MPI routing.
(can happen in the same LAN)
developed by vasily, reviewed by miked
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29479.
2013-10-23 06:08:54 +00:00
Rolf vandeVaart
3c916d55c9 Fix two issues pointed out in review of ticket #3870.
This commit was SVN r29473.
2013-10-22 17:28:12 +00:00
Mike Dubman
d27cffedb9 expand tabs to 4 spaces
cd ompi/mca/coll/fca
for i in *.[ch]; do expand -t 4 $i > koko && mv koko $i; done
Refs: #3799

This commit was SVN r29472.
2013-10-22 17:05:55 +00:00
Jeff Squyres
6714890244 paffinity.h is gone and won't be coming back.
This commit was SVN r29467.
2013-10-22 15:59:00 +00:00
Nathan Hjelm
280a89448f Make btl/vader valgrind safe.
cmr=v1.7.4:reviewer=samuel

This commit was SVN r29464.
2013-10-22 15:33:32 +00:00
Jeff Squyres
09fae6e62b Prefix DSO filenames with "lib" so that Automake doesn't complain.
Follow the convention established by the ompi/mca/common/sm tree and
prefix both the "install" and "no install" versions of the build with
"lib" so that Automake doesn't complain.  Differentiate the two by
adding a "_noinst" suffix to the "no install" version.

This commit was SVN r29462.
2013-10-22 13:16:33 +00:00