1
1
Граф коммитов

9050 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
0e433eaa78 Silence warning 2016-07-11 19:43:02 -07:00
Nathan Hjelm
b47208e909 osc/rdma: fix bug in CAS
This commit fixes a bug in the RDMA compare-and-swap implementation
that caused the origin value to always be written even if the compare
should have failed.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-11 09:54:23 -06:00
Edgar Gabriel
c8b1c6cae1 Merge pull request #1856 from edgargabriel/pr/zero-size-iread-iwrite
io/ompio: fix the request in case of a zero size write/read operation
2016-07-11 08:19:02 -05:00
Gilles Gouaillardet
14624506df coll/libnbc: do not exchange data between roots in ompi_coll_libnbc_ireduce_scatter_inter()
this is now useless since the scatter is done via the local communicator
2016-07-11 17:18:30 +09:00
Edgar Gabriel
3dd81e9e09 io/ompio: fix the request in case of a zero size write/read operation 2016-07-08 14:11:22 -05:00
Gilles Gouaillardet
a55d57406b coll/base: fix non zero lower bound datatype handling in mca_coll_base_alltoallv_intra_basic_inplace() 2016-07-08 16:55:26 +09:00
Gilles Gouaillardet
7b8094aac1 coll/base: silence misc warning
as reported by Coverity with CIDs 1363349-1363362

Offset temporary buffer when a non zero lower bound datatype is used.

Thanks Hristo Iliev for the report

(cherry picked from commit 0e393195d9)
2016-07-08 13:06:26 +09:00
Gilles Gouaillardet
678d08647b coll/libnbc: various fixes
- correctly handle non commutative operators
 - correctly handle non zero lower bound ddt
 - correctly handle ddt with size > extent
 - revamp NBC_Sched_op so it takes two buffers and matches ompi_op_reduce semantic
 - various fix for inter communicators

Thanks Yuki Matsumoto for the report
2016-07-07 15:55:49 +09:00
Gilles Gouaillardet
3e559a14a9 coll/inter: fix non standard ddt handling
- correctly handle non zero lower bound ddt
 - correctly handle ddt with size > extent

Thanks Yuki Matsumoto for the report
2016-07-07 15:49:59 +09:00
Gilles Gouaillardet
488d037d51 coll/basic: fix non standard ddt handling
- correctly handle non zero lower bound ddt
 - correctly handle ddt with size > extent

Thanks Yuki Matsumoto for the report
2016-07-07 15:49:53 +09:00
Gilles Gouaillardet
c06fb04a9a coll/base: fix non zero lower bound ddt handling in ompi_coll_base_reduce_intra_basic_linear()
Thanks Yuki Matsumoto for the report
2016-07-07 15:49:48 +09:00
Ralph Castain
ee56d9dc1a Shorten the session directory name as some OS's are now providing unusually long temp directory names, causing us to overflow the sockaddr field 2016-07-05 14:59:50 -07:00
George Bosilca
eac5b3c668 Various cleanups in the monitoring PML. 2016-07-05 18:31:25 +02:00
George Bosilca
73972768f8 Remove an apparently useless function. 2016-07-05 18:30:11 +02:00
Josh Hursey
59bf1f0c41 Merge pull request #1836 from jjhursey/topic/coll-nbc-0-count-ireduce
mpi/c: Add each check for count==0 in nonblocking reduce interface
2016-07-01 15:22:37 -05:00
Josh Hursey
9b4ed968a4 Merge pull request #1833 from jjhursey/topic/op-init-fix
op: Add a default value for MPI_OP o_name
2016-07-01 15:22:15 -05:00
Joshua Hursey
0671e45de0 op: Add a default value for MPI_OP o_name 2016-07-01 13:46:01 -05:00
Joshua Hursey
96779f68e8 mpi/c: Add each check for count==0 in nonblocking reduce interface
* Matches the blocking versions of these interfaces
   - `iallreduce.c` to match `allreduce.c`
   - `ireduce.c` to match `reduce.c`
   - `ireduce_scatter.c` to match `reduce_scatter.c`
 * Workaround for IMB-NBC benchmark, similar to the workaround
   in place for the IMB-MPI1 benchmark for the blocking collectives.
2016-07-01 13:45:30 -05:00
Joshua Hursey
0a09f8bc51 coll/hcoll: Protect module destruct when not fully initialized
* If hcoll is given a negative priority, but not enabled=0 then
   the module is constructed, but then destructed before calling
   it's query(). So the previous pointers are not initialized.
   If we try to OBJ_RELEASE them in a debug build an assert will fire.
   This commit adds some protection against that and initializes
   the _module pointers to NULL.
2016-07-01 13:41:27 -05:00
Joshua Hursey
59f304b9e9 coll/base: neg. priority cleanup, verbose output improvements
* Print a verbose message if the component was disqualified because of
   a negative priority.
 * If a disqualified component provided a module, release it.
 * Display list of selected components in priority order
   - During the process of volunteering collective functions for a
     communicator, print the component name and priority. This will
     cause the verbose messages to be displayed in reverse priority
     order (lowest priority first, up to highest). This is helpful
     when determining which collective components are active in which
     order for a given communicator.
     To see the messages you need the following MCA parameter set to 9
     or higher: `-mca coll_base_verbose 9`
 * Adjust verbose for commonly needed verbose output from 10 to 9 to
   make it easier to access this information.
2016-07-01 13:41:27 -05:00
Nathan Hjelm
f38cc00df9 Merge pull request #1835 from hjelmn/thread_fix
ompi/request: fix hang in ompi_request_wait_completion
2016-06-30 18:50:32 -06:00
Nathan Hjelm
445b79bba8 ompi/request: fix hang in ompi_request_wait_completion
This commit fixes a hang reported by @nysal which happens when a
request is completed after a sync object is created but before the
sync object can be assigned to the request. In this case we need to
set the sync signaling field to false to ensure WAIT_SYNC_RELEASE does
not hang.

Fixes open-mpi/ompi#1828

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-30 09:54:00 -06:00
Nathan Hjelm
5f390b5f5a bml/r2: be more restrictive on rdma endpoints
This commit makes bml/r2 more restrictive on which endpoints end up in the rdma
endpoint list. Before this commit an endpoint was added if it supported either
put or get. This was done to ensure that endpoints are available for RMA.
Thought it is possible to support put or get endpoints we only currently
support endpoints that have put, get, and amos. bml/r2 now reflects this
support.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-06-29 18:54:58 -06:00
Artem Polyakov
541715572f Fix MPI_Waitany and MPI_Waitsome
(request handling related)
2016-06-28 16:40:00 +03:00
Nathaniel Graham
bb9485bcd9 Fix Java Coverity issue
Fixing a possible error that Coverity pointed out in
ompi_java_exceptionCheck.

Signed-off-by: Nathaniel Graham <nrgraham23@gmail.com>
2016-06-23 15:09:07 +02:00
Nathan Hjelm
143a93f379 opal/sync: remove usage of OPAL_ENABLE_MULTI_THREADS
The OPAL_ENABLE_MULTI_THREADS macro is always defined as 1. This was
causing us to always use the multi-thread path for synchronization
objects. The code has been updated to use the opal_using_threads()
function. When MPI_THREAD_MULTIPLE support is disabled at build time
(2.x only) this function is a macro evaluating to false so the
compiler will optimize out the MT-path in this case. The
OPAL_ATOMIC_ADD_32 macro has been removed and replaced by the existing
OPAL_THREAD_ADD32 macro.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-22 09:52:37 -06:00
Nathaniel Graham
679a66ccc8 Merge pull request #1803 from nrgraham23/jni_error_handling
Java bindings exception handling fix
2016-06-22 04:14:00 -07:00
Nathan Hjelm
7bd7c0578b Merge pull request #1807 from hjelmn/request_perfm_regression
ompi/request: fix performance regression
2016-06-21 14:47:56 -06:00
Nathan Hjelm
544adb9aed ompi/request: fix performance regression
This commit fixes a performance regression introduced by the request
rework. We were always using the multi-thread path because
OPAL_ENABLE_MULTI_THREADS is either not defined or always defined to 1
depending on the Open MPI version. To fix this I removed the
conditional and added a conditional on opal_using_threads(). This path
will be optimized out in 2.0.0 in a non-thread-multiple build as
opal_using_threads is #defined to false in that case.

Fixes open-mpi/ompi#1806

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-21 11:45:32 -06:00
Nathan Hjelm
2409024c17 osc/rdma: fix typo
Need to increment the total size after checking the local offset not
before. This typo causes large allocations with MPI_Win_allocate() to
fail.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-21 09:50:29 -06:00
Nathaniel Graham
88dea4e4de Java bindings exception handling fix
Fixed an error where if there were no MPI exceptions, a
JNI error could still exist and not get handled.

Signed-off-by: Nathaniel Graham <nrgraham23@gmail.com>
2016-06-21 12:40:31 +02:00
George Bosilca
9c4f56be4b Fix the coll_base_sendrecv function. 2016-06-18 18:23:51 +02:00
Nathan Hjelm
3a69b727a6 Merge pull request #1788 from hjelmn/split_type
comm/split_type: allow MPI_UNDEFINED for split_type
2016-06-16 21:12:25 -06:00
Nathan Hjelm
65be935676 comm/split_type: allow MPI_UNDEFINED for split_type
It is valid for any rank to deviate on the split_type argument if they
specify MPI_UNDEFINED. The code was incorrectly not allowing this
condition. Changed the split_type uniformity check and allow
local_size to be 0 if the local split_type is MPI_UNDEFINED.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-16 17:42:28 -06:00
rhc54
702a982271 Merge pull request #1767 from rhc54/topic/pmix2
Enable the PMIx event notification capability
2016-06-16 15:27:43 -07:00
Nathan Hjelm
e135543cb0 Merge pull request #1785 from hjelmn/malloc_hook_fix
opal/memory: disable __malloc_initialize_hook if poisoned
2016-06-15 14:55:44 -06:00
Nathan Hjelm
7018aeda2b opal/memory: disable __malloc_initialize_hook if poisoned
Newer versions of gcc have "poisoned" the __malloc_initialize_hook
name and it can no longer be used. Added a configure check and
protection around its usage.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-15 12:00:49 -06:00
KAWASHIMA Takahiro
dff6accec6 ompi/datatype: Fix args of DARRAY
According to MPI-3.1 P.122, `ni` for `MPI_COMBINER_DARRAY`
should be `4*ndims+4`, not `4*size+4`.

This bug may cause SEGV if `size` is smaller than `ndims`
when the darray is used for one-sided communication (pt2pt OSC).

This bug was introduced in open-mpi/ompi@79b13f36 (when darray
became a first class citizen and the `a_i` index of darray was
shifted by 2). The corresponding `MPI_Type_create_darray()`
function sets a right value so we don't need to update the function.
2016-06-15 11:24:22 +09:00
Ralph Castain
5d330d5220 Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler.
Add PMIx 2.0

Remove PMIx 1.1.4

Cleanup copying of component

Add missing file

Touchup a typo in the Makefile.am

Update the pmix ext114 component

Minor cleanups and resync to master

Update to latest PMIx 2.x

Update to the PMIx event notification branch latest changes
2016-06-14 13:08:41 -07:00
Jeff Squyres
c2185bb4b8 Merge pull request #1781 from jsquyres/pr/disable-psm-psm2-signal-hijacking
PSM/PSM2: Disable signal handler hijacking by default
2016-06-14 15:33:24 -04:00
Jeff Squyres
5071602c59 PSM/PSM2: Disable signal handler hijacking by default
Per discussion on https://github.com/open-mpi/ompi/pull/1767 (and some
subsequent phone calls and off-issue email discussions), the PSM
library is hijacking signal handlers by default.  Specifically: unless
the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel
TrueScale) is set, the library constructor for this library will
hijack various signal handlers for the purpose of invoking its own
error reporting mechanisms.

This may be a bit *surprising*, but is not a *problem*, per se.  The
real problem is that older versions of at least the PSM library do not
unregister these signal handlers upon being unloaded from memory.
Hence, a segv can actually result in a double segv (i.e., the original
segv and then another segv when the now-non-existent signal handler is
invoked).

This PSM signal hijacking subverts Open MPI's own signal reporting
mechanism, which may be a bit surprising for some users (particularly
those who do not have Intel TrueScale).  As such, we disable it by
default so that Open MPI's own error-reporting mechanisms are used.

Additionally, there is a typo in the library destructor for the PSM2
library that may cause problems in the unloading of its signal
handlers.  This problem can be avoided by setting `HFI_NO_BACKTRACE=1`
(for PSM2 / Intel OmniPath).

This is further compounded by the fact that the PSM / PSM2 libraries
can be loaded by the OFI MTL and the usNIC BTL (because they are
loaded by libfabric), even when there is no Intel networking hardware
present.  Having the PSM/PSM2 libraries behave this way when no Intel
hardware is present is clearly undesirable (and is likely to be fixed
in future releases of the PSM/PSM2 libraries).

This commit sets the following two environment variables to disable
this behavior from the PSM/PSM2 libraries (if they are not already
set):

* IPATH_NO_BACKTRACE=1
* HFI_NO_BACKTRACE=1

If the user has set these variables before invoking Open MPI, we will
not override their values (i.e., their preferences will be honored).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-06-14 11:45:23 -07:00
Edgar Gabriel
1ddfd6cdca io/ompio: fix the preallocate function
handle preallocating sizes less than the current file size correctly.
2016-06-14 10:50:32 -05:00
KAWASHIMA Takahiro
84b110a1f2 ompi/datatype: Fix args of HINDEXED_BLOCK
According to MPI-3.1 P.121, `ni` for `MPI_COMBINER_HINDEXED_BLOCK`
should be `2`, not `2 + count`.

This bug was introduced in 113b45b4 (when `MPI_Type_create_hindexed_block`
support is added in Open MPI) and fixed partially in 7f5314ee and 8de93982.
This commit fixes the remaining part.

Probably this bug has no user impact. It only consumes a bit more memory.
2016-06-10 17:32:33 +09:00
Gilles Gouaillardet
80e362de52 coll/base: fix memory free in ompi_coll_base_allreduce_intra_recursivedoubling err handler
Fix CID 1362630

Fixes open-mpi/ompi@0e393195d9
2016-06-09 13:12:25 +09:00
Gilles Gouaillardet
ead7efef3f coll/basic: silence CID 1362614 in mca_coll_basic_allreduce_inter() 2016-06-09 09:40:19 +09:00
Gilles Gouaillardet
ad2e1a5ae9 coll/base: silence CID 1362613 in ompi_coll_base_alltoall_intra_basic_linear() 2016-06-09 09:40:05 +09:00
Gilles Gouaillardet
80b267af1c coll/base: silence CID 1362601 in ompi_coll_base_sendrecv_zero() 2016-06-09 09:37:31 +09:00
Gilles Gouaillardet
0e393195d9 coll/base: fix [all]reduce with non zero lower bound datatypes
Offset temporary buffer when a non zero lower bound datatype is used.

Thanks Hristo Iliev for the report
2016-06-08 16:48:00 +09:00
Nathan Hjelm
97c1643216 Merge pull request #1766 from hjelmn/req_fix
ompi/request: fix loop conditional
2016-06-07 12:11:56 -06:00
Nathan Hjelm
3ddf3ccbf3 Merge pull request #1758 from hjelmn/ob1_fixes
pml/ob1: bug fixes
2016-06-07 11:18:55 -06:00