1
1
Граф коммитов

25258 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
5d330d5220 Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler.
Add PMIx 2.0

Remove PMIx 1.1.4

Cleanup copying of component

Add missing file

Touchup a typo in the Makefile.am

Update the pmix ext114 component

Minor cleanups and resync to master

Update to latest PMIx 2.x

Update to the PMIx event notification branch latest changes
2016-06-14 13:08:41 -07:00
Jeff Squyres
c2185bb4b8 Merge pull request #1781 from jsquyres/pr/disable-psm-psm2-signal-hijacking
PSM/PSM2: Disable signal handler hijacking by default
2016-06-14 15:33:24 -04:00
Jeff Squyres
5071602c59 PSM/PSM2: Disable signal handler hijacking by default
Per discussion on https://github.com/open-mpi/ompi/pull/1767 (and some
subsequent phone calls and off-issue email discussions), the PSM
library is hijacking signal handlers by default.  Specifically: unless
the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel
TrueScale) is set, the library constructor for this library will
hijack various signal handlers for the purpose of invoking its own
error reporting mechanisms.

This may be a bit *surprising*, but is not a *problem*, per se.  The
real problem is that older versions of at least the PSM library do not
unregister these signal handlers upon being unloaded from memory.
Hence, a segv can actually result in a double segv (i.e., the original
segv and then another segv when the now-non-existent signal handler is
invoked).

This PSM signal hijacking subverts Open MPI's own signal reporting
mechanism, which may be a bit surprising for some users (particularly
those who do not have Intel TrueScale).  As such, we disable it by
default so that Open MPI's own error-reporting mechanisms are used.

Additionally, there is a typo in the library destructor for the PSM2
library that may cause problems in the unloading of its signal
handlers.  This problem can be avoided by setting `HFI_NO_BACKTRACE=1`
(for PSM2 / Intel OmniPath).

This is further compounded by the fact that the PSM / PSM2 libraries
can be loaded by the OFI MTL and the usNIC BTL (because they are
loaded by libfabric), even when there is no Intel networking hardware
present.  Having the PSM/PSM2 libraries behave this way when no Intel
hardware is present is clearly undesirable (and is likely to be fixed
in future releases of the PSM/PSM2 libraries).

This commit sets the following two environment variables to disable
this behavior from the PSM/PSM2 libraries (if they are not already
set):

* IPATH_NO_BACKTRACE=1
* HFI_NO_BACKTRACE=1

If the user has set these variables before invoking Open MPI, we will
not override their values (i.e., their preferences will be honored).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-06-14 11:45:23 -07:00
Edgar Gabriel
2886d93fb8 Merge pull request #1782 from edgargabriel/getview-preallocate-fixes
io/ompio: fix the preallocate function
2016-06-14 11:59:47 -05:00
Edgar Gabriel
1ddfd6cdca io/ompio: fix the preallocate function
handle preallocating sizes less than the current file size correctly.
2016-06-14 10:50:32 -05:00
KAWASHIMA Takahiro
eb37574afc Merge pull request #1773 from kawashima-fj/pr/hindexed-block-args
ompi/datatype: Fix args of HINDEXED_BLOCK
2016-06-13 14:03:17 +09:00
rhc54
5911cbcf7e Merge pull request #1777 from rhc54/topic/mr
Remove stale map-reduce support
2016-06-12 10:54:54 -07:00
Ralph Castain
a6e6c37484 Remove stale map-reduce support 2016-06-12 07:41:57 -07:00
Nathan Hjelm
9c62236303 Merge pull request #1775 from hjelmn/arm64
arm64: add atomic swap function
2016-06-11 14:12:39 -06:00
Nathan Hjelm
253c91972e arm64: add atomic swap function
This commit adds the opal_atomic_swap_32 and opal_atomic_swap_64
functions. This should improve the performance of btl/vader.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-06-11 09:46:29 -06:00
Joshua Ladd
3e47aa03b4 Merge pull request #1774 from igor-ivanov/pr/oshmem-lock-issue
oshmem: Fix double lock issue
2016-06-11 11:31:43 -04:00
Nathan Hjelm
109389dce2 Merge pull request #1634 from hjelmn/cma
cma: add support for MIPS and ARM
2016-06-11 09:20:28 -06:00
Igor Ivanov
a8ab5b55b9 oshmem: Fix double lock issue
Signed-off-by: Igor Ivanov <igor.ivanov.va@gmail.com>
2016-06-10 15:52:31 +03:00
KAWASHIMA Takahiro
84b110a1f2 ompi/datatype: Fix args of HINDEXED_BLOCK
According to MPI-3.1 P.121, `ni` for `MPI_COMBINER_HINDEXED_BLOCK`
should be `2`, not `2 + count`.

This bug was introduced in 113b45b4 (when `MPI_Type_create_hindexed_block`
support is added in Open MPI) and fixed partially in 7f5314ee and 8de93982.
This commit fixes the remaining part.

Probably this bug has no user impact. It only consumes a bit more memory.
2016-06-10 17:32:33 +09:00
Ralph Castain
d58da99dbc Shift to memcpy to avoid Solaris issues 2016-06-09 12:07:17 -07:00
Gilles Gouaillardet
80e362de52 coll/base: fix memory free in ompi_coll_base_allreduce_intra_recursivedoubling err handler
Fix CID 1362630

Fixes open-mpi/ompi@0e393195d9
2016-06-09 13:12:25 +09:00
Gilles Gouaillardet
1f651d17c1 opal/util/ethtool: fix (infamous) strncpy usage
the infamous strncpy does not NULL terminate the destination when the buffer is truncated
do it ourself !

fix CID 1362576
2016-06-09 09:54:50 +09:00
Gilles Gouaillardet
ead7efef3f coll/basic: silence CID 1362614 in mca_coll_basic_allreduce_inter() 2016-06-09 09:40:19 +09:00
Gilles Gouaillardet
ad2e1a5ae9 coll/base: silence CID 1362613 in ompi_coll_base_alltoall_intra_basic_linear() 2016-06-09 09:40:05 +09:00
Gilles Gouaillardet
80b267af1c coll/base: silence CID 1362601 in ompi_coll_base_sendrecv_zero() 2016-06-09 09:37:31 +09:00
rhc54
84e1425d32 Merge pull request #1772 from rhc54/topic/strnlen
Abstract the strnlen function for environments that do not have it (e.g., Solaris 10)
2016-06-08 12:19:17 -07:00
Ralph Castain
8fa935534b Abstract the strnlen function for environments that do not have it (e.g., Solaris 10) 2016-06-08 10:12:43 -07:00
Jeff Squyres
95ecae8688 coverity: add --enable-debug to nightly Coverity builds
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-06-08 08:01:47 -07:00
Gilles Gouaillardet
0e393195d9 coll/base: fix [all]reduce with non zero lower bound datatypes
Offset temporary buffer when a non zero lower bound datatype is used.

Thanks Hristo Iliev for the report
2016-06-08 16:48:00 +09:00
Nathan Hjelm
f8957f24af Merge pull request #1768 from hjelmn/cq_fix
btl/openib: fix cq resize calculation
2016-06-07 21:34:36 -06:00
Nathan Hjelm
dd519c55b1 btl/openib: fix cq resize calculation
Before dynamic add_procs the openib_btl_size_queues was called exactly
once for non-dynamic jobs. Now the function is called on each new
connection so the calculation was wrong. Re-wrote the function to
correctly calculate the CQ size and only attempt to adjust the CQ if
the requested size has changed. This fixes a bug when using the openib
btl on psm2 hardware that is caused by the time needed to resize a
CQ. The overhead was causing udcm to timeout and fail.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-07 16:05:56 -06:00
Nathan Hjelm
97c1643216 Merge pull request #1766 from hjelmn/req_fix
ompi/request: fix loop conditional
2016-06-07 12:11:56 -06:00
Nathan Hjelm
3ddf3ccbf3 Merge pull request #1758 from hjelmn/ob1_fixes
pml/ob1: bug fixes
2016-06-07 11:18:55 -06:00
Nathan Hjelm
5a4adb866d ompi/request: fix loop conditional
This commit fixes a bug in waitany that causes the code to go past the
beginning of the request array. The loop conditional i >= 0 is invalid
since i is unsigned. Changed to loop to check (i+1) > 0.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-07 10:28:46 -06:00
rhc54
0efb1b5d1f Merge pull request #1761 from hjelmn/progress_warnings
opal/progress: fix warnings
2016-06-07 06:35:09 -07:00
Nathan Hjelm
e082ed752a opal/progress: fix warnings
This commit fixes several warning introduced by
open-mpi/ompi@fc26d9c69f .

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-06 22:18:24 -06:00
Todd Kordenbrock
9671d6af47 Merge pull request #1689 from francois-wellenreiter/remove_trig_rdv_portals4
MTL portals4 : remove the triggered rendez-vous protocol
2016-06-06 21:55:01 -05:00
Nathan Hjelm
5d0b4679ea pml/ob1: bug fixes
This commit fixes two bugs in pml/ob1:

 - Do not called MCA_PML_OB1_PROGRESS_PENDING from
   mca_pml_ob1_send_request_start_copy as this may lead to a recursive
   call to mca_pml_ob1_send_request_process_pending.

 - In mca_pml_ob1_send_request_start_rdma return the rdma frag object
   if a btl fragment can not be allocated. This fixes a leak
   identified by @abouteiller and @bosilca.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-06 17:54:55 -06:00
rhc54
c2a02ab06c Merge pull request #1756 from rhc54/topic/hangs
Fix rare hangs observed on OS-X by properly thread-shifting upcalls from the PMIx server into ORTE
2016-06-06 07:41:58 -07:00
Joshua Ladd
db70852d31 Merge pull request #1757 from alinask/topic/revert_master_mlnx_opt_rdmacm_ibaddr
Revert "mellanox/optimized: set enable_openib_rdmacm_ibaddr=yes in the mellanox/optimized file."
2016-06-06 09:55:16 -04:00
Alina Sklarevich
a2be17ec14 Revert "mellanox/optimized: set enable_openib_rdmacm_ibaddr=yes in the mellanox/optimized file."
This reverts commit 6cd7282631.
2016-06-06 11:26:11 +03:00
Gilles Gouaillardet
01591626b3 Merge pull request #1295 from ggouaillardet/poc/nag_configury
configury: add support for NAG compilers
2016-06-06 13:42:49 +09:00
Ralph Castain
dd0f843843 Fix rare hangs observed on OS-X by properly thread-shifting upcalls from the PMIx server into ORTE 2016-06-05 21:39:44 -07:00
Nathan Hjelm
4a2bd83302 opal/cma: improve Linux CMA detection
This commit improves the CMA detection when the installed glibc doesn't
have support for CMA. In this case we need to verify that the syscall
numbers in opal/include/opal/sys/cma.h are valid for the architecture.
This verification is done by attempting to use CMA while including the
internal header.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-06-05 22:29:07 -06:00
Gilles Gouaillardet
dc5adc5a91 configury: pass -Wl,-Wl,,--enable-new-dtags when NAG compiler is used for linking
Thanks Paul Hargrove for the report
2016-06-06 11:54:25 +09:00
Gilles Gouaillardet
11b3bc962b configury: add the nagfor NAG compiler to the default Fortran compilers 2016-06-06 11:54:25 +09:00
Gilles Gouaillardet
1ce5393fa4 configury: add the -mismatch flag to NAG compiler
NAG compiler is too picky about naming convention and cannot build
OpenMPI unless the -mismatch flag is used
2016-06-06 11:54:25 +09:00
Gilles Gouaillardet
20bfc6b3d1 autogen: patch config/ltmain.sh in order to make NAG compiler pass the -pthread option to the linker 2016-06-06 11:54:24 +09:00
Gilles Gouaillardet
544a2f1631 configury: fix mpifort and oshmemfort wrapper data
NAG compiler use gcc (and not ld) as a linker, so in order to pass an option to the linker,
the flag is -Wl,-Wl,,<option> and not -Wl,<option>

Thanks Paul Hargrove for the report
2016-06-06 11:54:12 +09:00
Gilles Gouaillardet
bbed1d4a5f configury: append LIBS to OMPI_WRAPPER_EXTRA_LIBS
This is required so NAG compiler can build static OpenMPI

Thanks Paul Hargrove for the report.
2016-06-06 11:53:42 +09:00
Gilles Gouaillardet
c976559877 coll/basic: fix log basic bcast
The log basic bcast was completely broken. The rank 0 gets the
hibit set to -1, so it always returned an error.
2016-06-06 11:01:51 +09:00
Gilles Gouaillardet
b707d138fe pmix114/pmix1_client: fix misc memory leaks
Fixes CID 1325146-1325149
2016-06-06 09:33:35 +09:00
Gilles Gouaillardet
99fedcb7a3 fs/base: silence a memory leak in mca_fs_base_get_fstype()
Fixes CID 1351211
2016-06-06 09:20:14 +09:00
George Bosilca
9376b0340b Fix the basic barrier.
The log basic barrier was completely broken. The rank 0 gets the
hibit set to 0, so it always returned an error.
2016-06-03 23:46:25 -04:00
Jeff Squyres
c0fc0d12e4 Merge pull request #1581 from jsquyres/pr/AUTHORS-update
RFC: AUTHORS: reformat and include all git log email addresses
2016-06-03 18:18:02 -07:00