1
1
Граф коммитов

4563 Коммитов

Автор SHA1 Сообщение Дата
Yossi Etigin
7564e2c13f Fix a recursion in mxm send flow which happens when mpi starts a new send from the context of send completion callback.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30265.
2014-01-12 17:47:03 +00:00
Yossi Etigin
9504969f7d fix communicator double-free from pt2pt component, caused by r29938.
cmr=v1.7.5:reviewer=brbarret

This commit was SVN r30264.

The following SVN revision numbers were found above:
  r29938 --> open-mpi/ompi@ecfb122c97
2014-01-12 17:38:14 +00:00
Ralph Castain
286ff6d552 For large scale systems, we would like to avoid doing a full modex during MPI_Init so that launch will scale a little better. At the moment, our options are somewhat limited as only a few BTLs don't immediately call modex_recv on all procs during startup. However, for those situations where someone can take advantage of it, add the ability to do a "modex on demand" retrieval of data from remote procs when we launch via mpirun.
NOTE: launch performance will be absolutely awful if you do this with BTLs that aren't configured to modex_recv on first message!

Even with "modex on demand", we still have to do a barrier in place of the modex - we simply don't move any data around, which does reduce the time impact. The barrier is required to ensure that the other proc has in fact registered all its BTL info and therefore is prepared to hand over a complete data package. Otherwise, you may not get the info you need. In addition, the shared memory BTL can fail to properly rendezvous as it expects the barrier to be in place.

This behavior will *only* take effect under the following conditions:

1. launched via mpirun

2. #procs is greater than ompi_hostname_cutoff, which defaults to UINT32_MAX

3. mca param rte_orte_direct_modex is set to 1. At the moment, we are having problems getting this param to register properly, so only the first two conditions are in effect. Still, the bottom line is you have to *want* this behavior to get it.

The planned next evolution of this will be to make the direct modex be non-blocking - this will require two fixes:

1. if the remote proc doesn't have the required info, then let it delay its response until it does. This means we need a way for the MPI layer to tell the RTE "I am done entering modex data".

2. adjust the SM rendezvous logic to loop until the required file has been created

Creating a placeholder to bring this over to 1.7.5 when ready.

cmr=v1.7.5:reviewer=hjelmn:subject=Enable direct modex at scale

This commit was SVN r30259.
2014-01-11 17:36:06 +00:00
Jeff Squyres
34ae50a0ed Fix int <--> pointer casting by adding intermediate cast through (intptr_t)
Reviewed by Dave Goodell

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Add intptr_t casting in usnic btl

This commit was SVN r30243.
2014-01-10 20:42:53 +00:00
Nathan Hjelm
5259ab213f Fix one more error path in udreg. In this case we hit the maximum size
of the udreg cache and get a different error code back.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30242.
2014-01-10 19:27:32 +00:00
Ralph Castain
9566650458 Per Marco, don't define a "min" function if one is already defined to avoid conflict with cygwin reserved word
This commit was SVN r30241.
2014-01-10 18:03:25 +00:00
Ralph Castain
880943dc10 Per Marco, rename "interface" to "tcp_interface" to avoid cygwin reserved word
This commit was SVN r30240.
2014-01-10 18:02:22 +00:00
Ralph Castain
c7a94a57d7 Per Marco, rename ERROR tags to exit_ERROR to avoid cygwin reserved name issues.
Refs trac:4085

This commit was SVN r30239.

The following Trac tickets were found above:
  Ticket 4085 --> https://svn.open-mpi.org/trac/ompi/ticket/4085
2014-01-10 18:00:49 +00:00
Jeff Squyres
350d989c00 Fix OpenBSD warnings where <malloc.h> is available and usable, but not
intended to be used and emits a compile-time warning.

Thanks to Paul Hargrove for identifying the issue.

cmr=v1.7.4:reviewer=hjelmn:subject=remove/replace malloc.h

This commit was SVN r30231.
2014-01-10 17:20:49 +00:00
Jeff Squyres
53a3defde9 s/CACHE_LINE_SIZE/BASESMUMA_CACHE_LINE_SIZE/g to avoid a system macro
name clash on some BSDs.

cmr=v1.7.4:reviewer=pasha

This commit was SVN r30230.
2014-01-10 16:48:43 +00:00
Edgar Gabriel
217e61e345 add proper typcasts to intptr_t to avoid warnings on 32bit systems.
This commit was SVN r30229.
2014-01-10 16:19:04 +00:00
Jeff Squyres
212e07a1e9 Don't instantiate+init variables in a switch block.
Avoid compiler warning about (unnecessarily) initializing 2 variables
during instantiation at the top of a switch block (but outside of any
case statements): just declare the variables at the top of the outter
block.  They're already safely initialized, so don't worry about
initializing them in the instantiation.

Reviewed by Dave Goodell.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Don't instantiate+init variables in a switch block

This commit was SVN r30228.
2014-01-10 15:39:16 +00:00
Mike Dubman
110c99af4f sharing negative tag space between libNBC and HCOLL
fixed by devendar, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30224.
2014-01-10 12:51:34 +00:00
Nathan Hjelm
52c231df3e ob1 does not check the return code of mpool_register. This can cause the
ob1 dummy registration to actually be used when using udreg. Fix this by
always setting reg to NULL when mpool/udreg's register function fails.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30214.
2014-01-10 00:46:16 +00:00
Jeff Squyres
115025b8dd Ensure that the usnic BTL is only built on 64 bit Linux platforms.
Reviewed by Dave Goodell.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Ensure the usnic BTL only builds on 64 bit Linux

This commit was SVN r30199.
2014-01-09 22:17:01 +00:00
Brian Barrett
013e0ec771 * Add multi-device support to the Portals 4 btl.
* Remove use of the Portals 4 proc tag for the btl, as it's causing more
problems than its worth.

This commit was SVN r30191.
2014-01-09 20:01:42 +00:00
Nathan Hjelm
bb01fc2938 Add missing MCA variable enumerator sentinel.
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30178.
2014-01-09 15:28:42 +00:00
Alina Sklarevich
2869ff1782 mxm: fixes for compilation warnings.
removed set but not used variables and a variable that is unused.

reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30176.
2014-01-09 15:15:14 +00:00
Mike Dubman
0fae2caef3 Create a comm keyval for hcoll component with delete callback function.
Set comm attribute with keyval.
Wait for pending hcoll module tasks in comm delete callback where PML
still valid on the communicator. safely destroy hcoll context during
hcoll module destructor.

Author: Devendar Bureddy 
reviewed by miked

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30175.
2014-01-09 11:27:24 +00:00
Nathan Hjelm
10ecd80c8c Fix typo in udreg mpool that could cause us to try to use an invalid
registration. This was causing transaction errors on Aries systems.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30174.
2014-01-09 05:56:29 +00:00
Ralph Castain
2453843972 Add missing include - thanks to Paul Hargrove for spotting it
cmr=v1.7.4:reviewer=jsquyres:subject=add missing include in bcol

This commit was SVN r30171.
2014-01-09 03:57:55 +00:00
Jeff Squyres
9d41632eba Change the MCA level to 2 (from 5) on the rationale that it may be
needed for correctness.  The if_include/if_exclude are level 1, and
the TCP port range params are level 2; this parameter seems to be on
par with the TCP port range params.

Refs trac:4019

This commit was SVN r30161.

The following Trac tickets were found above:
  Ticket 4019 --> https://svn.open-mpi.org/trac/ompi/ticket/4019
2014-01-08 19:04:26 +00:00
Jeff Squyres
8c871c2db6 Fix some compiler warnings:
* Remove some set-but-not-used variables
 * Make a convenience function return void (we weren't using the
   return code, anyway)
 * Mark a function as inline (it was supposed to be inline anyway)

Reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7:subject=Fix usnic BTL compiler warnings

This commit was SVN r30160.
2014-01-08 16:57:14 +00:00
Ralph Castain
cb31187bbe Correct tcp_not_use_nodelay option processing - change in mca param system incorrectly reversed the original parameter
Thanks to Tetsuya Mishima for detecting it!

cmr=v1.7.4:reviewer=jsquyres:subject=Correct tcp_not_use_nodelay option processing

This commit was SVN r30157.
2014-01-08 15:12:50 +00:00
Mike Dubman
43d6a30693 Fix problems of:
- HCOLL close without init
- Call hcoll progress after comm finalize
- mpirun default for coll_hcoll_enable is 1

fixed by Igor, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30156.
2014-01-08 10:55:25 +00:00
Ralph Castain
e2ca265f40 Per 1/7/2014 telecon: Add an MCA param to turn on all warnings for missing excluded interfaces.
Refs trac:4019

This commit was SVN r30146.

The following Trac tickets were found above:
  Ticket 4019 --> https://svn.open-mpi.org/trac/ompi/ticket/4019
2014-01-08 00:21:25 +00:00
Jeff Squyres
13b29cff2c This commit compliements/completes r30140. r30140 made all the
configury/Makefile.am changes; this commit renames the internal
installdirs.h framework struct field names to match the configry macro
names:

 * pkgdatdir ->	ompidatadir
 * pkglibdir -> ompilibdir
 * pkgincludedir -> ompiincludedir

This commit was SVN r30145.

The following SVN revision numbers were found above:
  r30140 --> open-mpi/ompi@8b778903d8
2014-01-07 23:36:33 +00:00
Brian Barrett
7d472ad5a5 Improve some comments
This commit was SVN r30144.
2014-01-07 23:35:04 +00:00
Jeff Squyres
50d20ade82 Fix compiler warnings: remove unused variables
This commit was SVN r30143.
2014-01-07 23:21:47 +00:00
Jeff Squyres
8349e122e8 Fix compiler warning (signed/unsigned comparison)
This commit was SVN r30142.
2014-01-07 23:18:55 +00:00
Brian Barrett
afde8370b3 Pull both calls to get into one function, and wrap with the appropriate
reference count if flow control is enabled.

This commit was SVN r30141.
2014-01-07 23:15:09 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Tom Naughton
c01db6faca fix typo in btl:vader for OMPI_LOCAL_RANK_INVALID
This commit was SVN r30139.
2014-01-07 21:42:51 +00:00
Brian Barrett
dbcc53bc6f Fix a threading issue
Remove some unneeded UNLIKELYs

This commit was SVN r30138.
2014-01-07 19:41:39 +00:00
Rolf vandeVaart
b3edca19df Add braces per coding convention and design review.
This commit was SVN r30137.
2014-01-07 17:30:37 +00:00
Jeff Squyres
8bf4ad9030 Refs trac:4301
Complements r30073: tighten up the string parsing of the vendor parts
ID MCA param a bit.  Also fix a small memory leak: ensure to free the
array uint32_t's parsed out of the MCA param.

This commit was SVN r30128.

The following SVN revision numbers were found above:
  r30073 --> open-mpi/ompi@6003702a51

The following Trac tickets were found above:
  Ticket 4301 --> https://svn.open-mpi.org/trac/ompi/ticket/4301
2014-01-06 22:16:04 +00:00
Nathan Hjelm
e627c91227 btl/vader: add support for traditional shared memory.
This commit adds support for placing the send memory segment in a
traditional shared memory segment when XPMEM is not available. The
current default is to reserve 4MB for shared memory on each process.
The latest benchmarks show vader performing better than sm on both
Intel and AMD CPUs.

For large messages vader will now use CMA if it is available (and
XPMEM is not).

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30123.
2014-01-06 19:51:44 +00:00
Nathan Hjelm
5c8ea3a251 btl/openib: Move free list memory allocation to add_procs
Per RFC which expired two weeks ago:

We are planning to make a change to Open MPI to always set up the btls. This
means the btl init will be called even if add_procs is never called for that
btl. In the openib btl free lists fragments are currently allocated in btl_init.
To avoid wasting that memory this commit moves that final device setup to
the add_procs function. This included allocating free lists, and starting the
async event thread.

At this time this change is safe since we have a barrier after add_procs in
MPI_Init. If this changes we will need to re-think some of the initialization
since we might have the possibility of a connection request before add_procs
is called.

Tested with Mellanox ConnectX2 and QLogic HCAs.

Commit also cleans up tabs in btl_openib_async.c.

cmr=v1.7.5:reviewer=miked

This commit was SVN r30122.
2014-01-06 19:51:30 +00:00
Brian Barrett
d4bb1cbbad * Start working on thread safety of Portals 4 MTL
* Only call flowctl_add_procs if there's a new proc in the add_procs call

This commit was SVN r30110.
2014-01-02 22:37:01 +00:00
Brian Barrett
e811a8a9cb Make the Portals 4 collective component disable itself when there's not a
Portals 4 point-to-point (MTL or BTL) component in use

This commit was SVN r30109.
2014-01-02 22:35:37 +00:00
Ralph Castain
871f4e519c Silence warning
Refs trac:4040

This commit was SVN r30105.

The following Trac tickets were found above:
  Ticket 4040 --> https://svn.open-mpi.org/trac/ompi/ticket/4040
2014-01-02 16:05:54 +00:00
Rolf vandeVaart
c47e06463d Adjust CUDA related crossover value.
This commit was SVN r30100.
2013-12-30 18:39:11 +00:00
Rolf vandeVaart
e7f430d9ac Add empty line that was inadvertently removed in message.
This commit was SVN r30099.
2013-12-30 18:38:07 +00:00
George Bosilca
947c180d7f Create a finalize function to provide an opportunity to the mpool
base to release the internal structures.

This commit was SVN r30098.
2013-12-29 11:45:46 +00:00
Ralph Castain
652f7a120f Add Mellanox device IDs that were included in prior releases, but somehow missing again here
cmr=v1.7.4:reviewer=miked

This commit was SVN r30095.
2013-12-26 17:47:05 +00:00
Ralph Castain
62378a64c8 As Jeff pointed out, the reqd flag should only turn off the show_help - still enter the rest of the code block
Refs trac:4019

This commit was SVN r30091.

The following Trac tickets were found above:
  Ticket 4019 --> https://svn.open-mpi.org/trac/ompi/ticket/4019
2013-12-26 15:02:41 +00:00
Ralph Castain
a8a91b374e Update component-level selection comments to match latest revisions
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30087.
2013-12-25 19:12:43 +00:00
Jeff Squyres
12d23e9c92 Left out valid end-of-string comparison in r30073.
Refs trac:4031

This commit was SVN r30074.

The following SVN revision numbers were found above:
  r30073 --> open-mpi/ompi@6003702a51

The following Trac tickets were found above:
  Ticket 4031 --> https://svn.open-mpi.org/trac/ompi/ticket/4031
2013-12-24 12:07:56 +00:00
Jeff Squyres
6003702a51 Minor improvements to the usnic BTL:
1. Fix ompi_info memory leak in usnic BTL: do not allocate memory in
    the component register function, because ompi_info only calls the
    component register function and then dlclose's the component -- it
    does not call component finalize.  Instead, defer parsing the MCA
    param (and alloc'ing memory) until the component init function so
    that any allocated memory can be freed in the component close
    function.
 1. Also add a new check to ensure that we actually have some part
    numbers to check.  Add a show_help message if we don't find any
    vendor part IDs to check.
 1. Add a verbose output if usnic disqualifies itself from selection
    because THREAD_MULTIPLE was specified.

cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r30073.
2013-12-24 11:57:35 +00:00
Ralph Castain
9eebb79d54 Cleanup a loop that couldn't possibly execute as the outer loop indexed was being reused by the inner loops, leaving the index at the cutoff point after the first iteration
cmr=v1.7.4:reviewer=edgar:subject=Cleanup loop in sharedfp

This commit was SVN r30059.
2013-12-23 18:34:34 +00:00