1
1
Граф коммитов

4571 Коммитов

Автор SHA1 Сообщение Дата
Edgar Gabriel
be5d5834c5 fix the problem identified by a user on the mailing list with MPI_MODE_EXCL
cmr=v1.7.4:reviewer=vvenkatesan:subject=fix a problem when opening a file with MODE_EXCL

This commit was SVN r30324.
2014-01-18 16:06:27 +00:00
Nathan Hjelm
c88626510c Fix a merge issues with new ROMIO and fix obvious ROMIO bug.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30319.
2014-01-18 00:29:16 +00:00
Hadi Montakhabi
8c14411289 f_cc_size is contiguous chunk size, not the stripe width. There is no stripe_width in the file handle structure.
This commit was SVN r30314.
2014-01-17 18:35:55 +00:00
Nathan Hjelm
f2a73fcdbd udreg: free huge page allocations correctly
This commit fixes an error path that occurs when huge page allocations are
enabled. In this case we allocate a huge page and try to register it but fail.
We then were calling free on the opal object. Fix this by calling the proper free
function.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30289.
2014-01-14 16:26:06 +00:00
Nathan Hjelm
f9d2032705 vader: ensure fast box data is aligned on 4-byte boundaries
This commit fixes a bus error on Solaris/Sparc.

Closes trac:4111

cmr=v1.7.5:ticket=trac:4053

This commit was SVN r30288.

The following Trac tickets were found above:
  Ticket 4053 --> https://svn.open-mpi.org/trac/ompi/ticket/4053
  Ticket 4111 --> https://svn.open-mpi.org/trac/ompi/ticket/4111
2014-01-14 16:04:52 +00:00
Rolf vandeVaart
e75afb2b82 Fix bug in distance computation code when deciding which devices to use on a NUMA node.
Also add a verbose flag so one can see what devices are selected as well as another flag to override
locality information and use all devices on the node.  

This commit was SVN r30287.
2014-01-14 15:41:56 +00:00
Nathan Hjelm
da1316ca6e vader: don't OBJ_RELEASE endpoint rcaches.
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30284.
2014-01-13 23:44:34 +00:00
Jeff Squyres
20d6391734 Patch submitted by Paul Hargrove to fix NetBSD compile with -laio.
NetBSD puts the AIO functions in -lrt, vs. the usual libc.  So we
need the fbtl/posix configure.m4 to test for -lrt properly.

Reviewed by Jeff Squyres.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Fix NetBSD use of -laio

This commit was SVN r30274.
2014-01-13 18:49:39 +00:00
Yossi Etigin
7564e2c13f Fix a recursion in mxm send flow which happens when mpi starts a new send from the context of send completion callback.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30265.
2014-01-12 17:47:03 +00:00
Yossi Etigin
9504969f7d fix communicator double-free from pt2pt component, caused by r29938.
cmr=v1.7.5:reviewer=brbarret

This commit was SVN r30264.

The following SVN revision numbers were found above:
  r29938 --> open-mpi/ompi@ecfb122c97
2014-01-12 17:38:14 +00:00
Ralph Castain
286ff6d552 For large scale systems, we would like to avoid doing a full modex during MPI_Init so that launch will scale a little better. At the moment, our options are somewhat limited as only a few BTLs don't immediately call modex_recv on all procs during startup. However, for those situations where someone can take advantage of it, add the ability to do a "modex on demand" retrieval of data from remote procs when we launch via mpirun.
NOTE: launch performance will be absolutely awful if you do this with BTLs that aren't configured to modex_recv on first message!

Even with "modex on demand", we still have to do a barrier in place of the modex - we simply don't move any data around, which does reduce the time impact. The barrier is required to ensure that the other proc has in fact registered all its BTL info and therefore is prepared to hand over a complete data package. Otherwise, you may not get the info you need. In addition, the shared memory BTL can fail to properly rendezvous as it expects the barrier to be in place.

This behavior will *only* take effect under the following conditions:

1. launched via mpirun

2. #procs is greater than ompi_hostname_cutoff, which defaults to UINT32_MAX

3. mca param rte_orte_direct_modex is set to 1. At the moment, we are having problems getting this param to register properly, so only the first two conditions are in effect. Still, the bottom line is you have to *want* this behavior to get it.

The planned next evolution of this will be to make the direct modex be non-blocking - this will require two fixes:

1. if the remote proc doesn't have the required info, then let it delay its response until it does. This means we need a way for the MPI layer to tell the RTE "I am done entering modex data".

2. adjust the SM rendezvous logic to loop until the required file has been created

Creating a placeholder to bring this over to 1.7.5 when ready.

cmr=v1.7.5:reviewer=hjelmn:subject=Enable direct modex at scale

This commit was SVN r30259.
2014-01-11 17:36:06 +00:00
Jeff Squyres
34ae50a0ed Fix int <--> pointer casting by adding intermediate cast through (intptr_t)
Reviewed by Dave Goodell

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Add intptr_t casting in usnic btl

This commit was SVN r30243.
2014-01-10 20:42:53 +00:00
Nathan Hjelm
5259ab213f Fix one more error path in udreg. In this case we hit the maximum size
of the udreg cache and get a different error code back.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30242.
2014-01-10 19:27:32 +00:00
Ralph Castain
9566650458 Per Marco, don't define a "min" function if one is already defined to avoid conflict with cygwin reserved word
This commit was SVN r30241.
2014-01-10 18:03:25 +00:00
Ralph Castain
880943dc10 Per Marco, rename "interface" to "tcp_interface" to avoid cygwin reserved word
This commit was SVN r30240.
2014-01-10 18:02:22 +00:00
Ralph Castain
c7a94a57d7 Per Marco, rename ERROR tags to exit_ERROR to avoid cygwin reserved name issues.
Refs trac:4085

This commit was SVN r30239.

The following Trac tickets were found above:
  Ticket 4085 --> https://svn.open-mpi.org/trac/ompi/ticket/4085
2014-01-10 18:00:49 +00:00
Jeff Squyres
350d989c00 Fix OpenBSD warnings where <malloc.h> is available and usable, but not
intended to be used and emits a compile-time warning.

Thanks to Paul Hargrove for identifying the issue.

cmr=v1.7.4:reviewer=hjelmn:subject=remove/replace malloc.h

This commit was SVN r30231.
2014-01-10 17:20:49 +00:00
Jeff Squyres
53a3defde9 s/CACHE_LINE_SIZE/BASESMUMA_CACHE_LINE_SIZE/g to avoid a system macro
name clash on some BSDs.

cmr=v1.7.4:reviewer=pasha

This commit was SVN r30230.
2014-01-10 16:48:43 +00:00
Edgar Gabriel
217e61e345 add proper typcasts to intptr_t to avoid warnings on 32bit systems.
This commit was SVN r30229.
2014-01-10 16:19:04 +00:00
Jeff Squyres
212e07a1e9 Don't instantiate+init variables in a switch block.
Avoid compiler warning about (unnecessarily) initializing 2 variables
during instantiation at the top of a switch block (but outside of any
case statements): just declare the variables at the top of the outter
block.  They're already safely initialized, so don't worry about
initializing them in the instantiation.

Reviewed by Dave Goodell.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Don't instantiate+init variables in a switch block

This commit was SVN r30228.
2014-01-10 15:39:16 +00:00
Mike Dubman
110c99af4f sharing negative tag space between libNBC and HCOLL
fixed by devendar, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30224.
2014-01-10 12:51:34 +00:00
Nathan Hjelm
52c231df3e ob1 does not check the return code of mpool_register. This can cause the
ob1 dummy registration to actually be used when using udreg. Fix this by
always setting reg to NULL when mpool/udreg's register function fails.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30214.
2014-01-10 00:46:16 +00:00
Jeff Squyres
115025b8dd Ensure that the usnic BTL is only built on 64 bit Linux platforms.
Reviewed by Dave Goodell.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Ensure the usnic BTL only builds on 64 bit Linux

This commit was SVN r30199.
2014-01-09 22:17:01 +00:00
Brian Barrett
013e0ec771 * Add multi-device support to the Portals 4 btl.
* Remove use of the Portals 4 proc tag for the btl, as it's causing more
problems than its worth.

This commit was SVN r30191.
2014-01-09 20:01:42 +00:00
Nathan Hjelm
bb01fc2938 Add missing MCA variable enumerator sentinel.
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30178.
2014-01-09 15:28:42 +00:00
Alina Sklarevich
2869ff1782 mxm: fixes for compilation warnings.
removed set but not used variables and a variable that is unused.

reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30176.
2014-01-09 15:15:14 +00:00
Mike Dubman
0fae2caef3 Create a comm keyval for hcoll component with delete callback function.
Set comm attribute with keyval.
Wait for pending hcoll module tasks in comm delete callback where PML
still valid on the communicator. safely destroy hcoll context during
hcoll module destructor.

Author: Devendar Bureddy 
reviewed by miked

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30175.
2014-01-09 11:27:24 +00:00
Nathan Hjelm
10ecd80c8c Fix typo in udreg mpool that could cause us to try to use an invalid
registration. This was causing transaction errors on Aries systems.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30174.
2014-01-09 05:56:29 +00:00
Ralph Castain
2453843972 Add missing include - thanks to Paul Hargrove for spotting it
cmr=v1.7.4:reviewer=jsquyres:subject=add missing include in bcol

This commit was SVN r30171.
2014-01-09 03:57:55 +00:00
Jeff Squyres
9d41632eba Change the MCA level to 2 (from 5) on the rationale that it may be
needed for correctness.  The if_include/if_exclude are level 1, and
the TCP port range params are level 2; this parameter seems to be on
par with the TCP port range params.

Refs trac:4019

This commit was SVN r30161.

The following Trac tickets were found above:
  Ticket 4019 --> https://svn.open-mpi.org/trac/ompi/ticket/4019
2014-01-08 19:04:26 +00:00
Jeff Squyres
8c871c2db6 Fix some compiler warnings:
* Remove some set-but-not-used variables
 * Make a convenience function return void (we weren't using the
   return code, anyway)
 * Mark a function as inline (it was supposed to be inline anyway)

Reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7:subject=Fix usnic BTL compiler warnings

This commit was SVN r30160.
2014-01-08 16:57:14 +00:00
Ralph Castain
cb31187bbe Correct tcp_not_use_nodelay option processing - change in mca param system incorrectly reversed the original parameter
Thanks to Tetsuya Mishima for detecting it!

cmr=v1.7.4:reviewer=jsquyres:subject=Correct tcp_not_use_nodelay option processing

This commit was SVN r30157.
2014-01-08 15:12:50 +00:00
Mike Dubman
43d6a30693 Fix problems of:
- HCOLL close without init
- Call hcoll progress after comm finalize
- mpirun default for coll_hcoll_enable is 1

fixed by Igor, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30156.
2014-01-08 10:55:25 +00:00
Ralph Castain
e2ca265f40 Per 1/7/2014 telecon: Add an MCA param to turn on all warnings for missing excluded interfaces.
Refs trac:4019

This commit was SVN r30146.

The following Trac tickets were found above:
  Ticket 4019 --> https://svn.open-mpi.org/trac/ompi/ticket/4019
2014-01-08 00:21:25 +00:00
Jeff Squyres
13b29cff2c This commit compliements/completes r30140. r30140 made all the
configury/Makefile.am changes; this commit renames the internal
installdirs.h framework struct field names to match the configry macro
names:

 * pkgdatdir ->	ompidatadir
 * pkglibdir -> ompilibdir
 * pkgincludedir -> ompiincludedir

This commit was SVN r30145.

The following SVN revision numbers were found above:
  r30140 --> open-mpi/ompi@8b778903d8
2014-01-07 23:36:33 +00:00
Brian Barrett
7d472ad5a5 Improve some comments
This commit was SVN r30144.
2014-01-07 23:35:04 +00:00
Jeff Squyres
50d20ade82 Fix compiler warnings: remove unused variables
This commit was SVN r30143.
2014-01-07 23:21:47 +00:00
Jeff Squyres
8349e122e8 Fix compiler warning (signed/unsigned comparison)
This commit was SVN r30142.
2014-01-07 23:18:55 +00:00
Brian Barrett
afde8370b3 Pull both calls to get into one function, and wrap with the appropriate
reference count if flow control is enabled.

This commit was SVN r30141.
2014-01-07 23:15:09 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Tom Naughton
c01db6faca fix typo in btl:vader for OMPI_LOCAL_RANK_INVALID
This commit was SVN r30139.
2014-01-07 21:42:51 +00:00
Brian Barrett
dbcc53bc6f Fix a threading issue
Remove some unneeded UNLIKELYs

This commit was SVN r30138.
2014-01-07 19:41:39 +00:00
Rolf vandeVaart
b3edca19df Add braces per coding convention and design review.
This commit was SVN r30137.
2014-01-07 17:30:37 +00:00
Jeff Squyres
8bf4ad9030 Refs trac:4301
Complements r30073: tighten up the string parsing of the vendor parts
ID MCA param a bit.  Also fix a small memory leak: ensure to free the
array uint32_t's parsed out of the MCA param.

This commit was SVN r30128.

The following SVN revision numbers were found above:
  r30073 --> open-mpi/ompi@6003702a51

The following Trac tickets were found above:
  Ticket 4301 --> https://svn.open-mpi.org/trac/ompi/ticket/4301
2014-01-06 22:16:04 +00:00
Nathan Hjelm
e627c91227 btl/vader: add support for traditional shared memory.
This commit adds support for placing the send memory segment in a
traditional shared memory segment when XPMEM is not available. The
current default is to reserve 4MB for shared memory on each process.
The latest benchmarks show vader performing better than sm on both
Intel and AMD CPUs.

For large messages vader will now use CMA if it is available (and
XPMEM is not).

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30123.
2014-01-06 19:51:44 +00:00
Nathan Hjelm
5c8ea3a251 btl/openib: Move free list memory allocation to add_procs
Per RFC which expired two weeks ago:

We are planning to make a change to Open MPI to always set up the btls. This
means the btl init will be called even if add_procs is never called for that
btl. In the openib btl free lists fragments are currently allocated in btl_init.
To avoid wasting that memory this commit moves that final device setup to
the add_procs function. This included allocating free lists, and starting the
async event thread.

At this time this change is safe since we have a barrier after add_procs in
MPI_Init. If this changes we will need to re-think some of the initialization
since we might have the possibility of a connection request before add_procs
is called.

Tested with Mellanox ConnectX2 and QLogic HCAs.

Commit also cleans up tabs in btl_openib_async.c.

cmr=v1.7.5:reviewer=miked

This commit was SVN r30122.
2014-01-06 19:51:30 +00:00
Brian Barrett
d4bb1cbbad * Start working on thread safety of Portals 4 MTL
* Only call flowctl_add_procs if there's a new proc in the add_procs call

This commit was SVN r30110.
2014-01-02 22:37:01 +00:00
Brian Barrett
e811a8a9cb Make the Portals 4 collective component disable itself when there's not a
Portals 4 point-to-point (MTL or BTL) component in use

This commit was SVN r30109.
2014-01-02 22:35:37 +00:00
Ralph Castain
871f4e519c Silence warning
Refs trac:4040

This commit was SVN r30105.

The following Trac tickets were found above:
  Ticket 4040 --> https://svn.open-mpi.org/trac/ompi/ticket/4040
2014-01-02 16:05:54 +00:00
Rolf vandeVaart
c47e06463d Adjust CUDA related crossover value.
This commit was SVN r30100.
2013-12-30 18:39:11 +00:00