1
1

9218 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
3a69b727a6 Merge pull request #1788 from hjelmn/split_type
comm/split_type: allow MPI_UNDEFINED for split_type
2016-06-16 21:12:25 -06:00
Nathan Hjelm
65be935676 comm/split_type: allow MPI_UNDEFINED for split_type
It is valid for any rank to deviate on the split_type argument if they
specify MPI_UNDEFINED. The code was incorrectly not allowing this
condition. Changed the split_type uniformity check and allow
local_size to be 0 if the local split_type is MPI_UNDEFINED.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-16 17:42:28 -06:00
rhc54
702a982271 Merge pull request #1767 from rhc54/topic/pmix2
Enable the PMIx event notification capability
2016-06-16 15:27:43 -07:00
Nathan Hjelm
e135543cb0 Merge pull request #1785 from hjelmn/malloc_hook_fix
opal/memory: disable __malloc_initialize_hook if poisoned
2016-06-15 14:55:44 -06:00
Nathan Hjelm
7018aeda2b opal/memory: disable __malloc_initialize_hook if poisoned
Newer versions of gcc have "poisoned" the __malloc_initialize_hook
name and it can no longer be used. Added a configure check and
protection around its usage.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-15 12:00:49 -06:00
KAWASHIMA Takahiro
dff6accec6 ompi/datatype: Fix args of DARRAY
According to MPI-3.1 P.122, `ni` for `MPI_COMBINER_DARRAY`
should be `4*ndims+4`, not `4*size+4`.

This bug may cause SEGV if `size` is smaller than `ndims`
when the darray is used for one-sided communication (pt2pt OSC).

This bug was introduced in open-mpi/ompi@79b13f36 (when darray
became a first class citizen and the `a_i` index of darray was
shifted by 2). The corresponding `MPI_Type_create_darray()`
function sets a right value so we don't need to update the function.
2016-06-15 11:24:22 +09:00
Ralph Castain
5d330d5220 Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler.
Add PMIx 2.0

Remove PMIx 1.1.4

Cleanup copying of component

Add missing file

Touchup a typo in the Makefile.am

Update the pmix ext114 component

Minor cleanups and resync to master

Update to latest PMIx 2.x

Update to the PMIx event notification branch latest changes
2016-06-14 13:08:41 -07:00
Jeff Squyres
c2185bb4b8 Merge pull request #1781 from jsquyres/pr/disable-psm-psm2-signal-hijacking
PSM/PSM2: Disable signal handler hijacking by default
2016-06-14 15:33:24 -04:00
Jeff Squyres
5071602c59 PSM/PSM2: Disable signal handler hijacking by default
Per discussion on https://github.com/open-mpi/ompi/pull/1767 (and some
subsequent phone calls and off-issue email discussions), the PSM
library is hijacking signal handlers by default.  Specifically: unless
the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel
TrueScale) is set, the library constructor for this library will
hijack various signal handlers for the purpose of invoking its own
error reporting mechanisms.

This may be a bit *surprising*, but is not a *problem*, per se.  The
real problem is that older versions of at least the PSM library do not
unregister these signal handlers upon being unloaded from memory.
Hence, a segv can actually result in a double segv (i.e., the original
segv and then another segv when the now-non-existent signal handler is
invoked).

This PSM signal hijacking subverts Open MPI's own signal reporting
mechanism, which may be a bit surprising for some users (particularly
those who do not have Intel TrueScale).  As such, we disable it by
default so that Open MPI's own error-reporting mechanisms are used.

Additionally, there is a typo in the library destructor for the PSM2
library that may cause problems in the unloading of its signal
handlers.  This problem can be avoided by setting `HFI_NO_BACKTRACE=1`
(for PSM2 / Intel OmniPath).

This is further compounded by the fact that the PSM / PSM2 libraries
can be loaded by the OFI MTL and the usNIC BTL (because they are
loaded by libfabric), even when there is no Intel networking hardware
present.  Having the PSM/PSM2 libraries behave this way when no Intel
hardware is present is clearly undesirable (and is likely to be fixed
in future releases of the PSM/PSM2 libraries).

This commit sets the following two environment variables to disable
this behavior from the PSM/PSM2 libraries (if they are not already
set):

* IPATH_NO_BACKTRACE=1
* HFI_NO_BACKTRACE=1

If the user has set these variables before invoking Open MPI, we will
not override their values (i.e., their preferences will be honored).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-06-14 11:45:23 -07:00
Edgar Gabriel
1ddfd6cdca io/ompio: fix the preallocate function
handle preallocating sizes less than the current file size correctly.
2016-06-14 10:50:32 -05:00
KAWASHIMA Takahiro
84b110a1f2 ompi/datatype: Fix args of HINDEXED_BLOCK
According to MPI-3.1 P.121, `ni` for `MPI_COMBINER_HINDEXED_BLOCK`
should be `2`, not `2 + count`.

This bug was introduced in 113b45b4 (when `MPI_Type_create_hindexed_block`
support is added in Open MPI) and fixed partially in 7f5314ee and 8de93982.
This commit fixes the remaining part.

Probably this bug has no user impact. It only consumes a bit more memory.
2016-06-10 17:32:33 +09:00
Gilles Gouaillardet
80e362de52 coll/base: fix memory free in ompi_coll_base_allreduce_intra_recursivedoubling err handler
Fix CID 1362630

Fixes open-mpi/ompi@0e393195d9
2016-06-09 13:12:25 +09:00
Gilles Gouaillardet
ead7efef3f coll/basic: silence CID 1362614 in mca_coll_basic_allreduce_inter() 2016-06-09 09:40:19 +09:00
Gilles Gouaillardet
ad2e1a5ae9 coll/base: silence CID 1362613 in ompi_coll_base_alltoall_intra_basic_linear() 2016-06-09 09:40:05 +09:00
Gilles Gouaillardet
80b267af1c coll/base: silence CID 1362601 in ompi_coll_base_sendrecv_zero() 2016-06-09 09:37:31 +09:00
Gilles Gouaillardet
0e393195d9 coll/base: fix [all]reduce with non zero lower bound datatypes
Offset temporary buffer when a non zero lower bound datatype is used.

Thanks Hristo Iliev for the report
2016-06-08 16:48:00 +09:00
Nathan Hjelm
97c1643216 Merge pull request #1766 from hjelmn/req_fix
ompi/request: fix loop conditional
2016-06-07 12:11:56 -06:00
Nathan Hjelm
3ddf3ccbf3 Merge pull request #1758 from hjelmn/ob1_fixes
pml/ob1: bug fixes
2016-06-07 11:18:55 -06:00
Nathan Hjelm
5a4adb866d ompi/request: fix loop conditional
This commit fixes a bug in waitany that causes the code to go past the
beginning of the request array. The loop conditional i >= 0 is invalid
since i is unsigned. Changed to loop to check (i+1) > 0.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-07 10:28:46 -06:00
Todd Kordenbrock
9671d6af47 Merge pull request #1689 from francois-wellenreiter/remove_trig_rdv_portals4
MTL portals4 : remove the triggered rendez-vous protocol
2016-06-06 21:55:01 -05:00
Nathan Hjelm
5d0b4679ea pml/ob1: bug fixes
This commit fixes two bugs in pml/ob1:

 - Do not called MCA_PML_OB1_PROGRESS_PENDING from
   mca_pml_ob1_send_request_start_copy as this may lead to a recursive
   call to mca_pml_ob1_send_request_process_pending.

 - In mca_pml_ob1_send_request_start_rdma return the rdma frag object
   if a btl fragment can not be allocated. This fixes a leak
   identified by @abouteiller and @bosilca.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-06 17:54:55 -06:00
Gilles Gouaillardet
544a2f1631 configury: fix mpifort and oshmemfort wrapper data
NAG compiler use gcc (and not ld) as a linker, so in order to pass an option to the linker,
the flag is -Wl,-Wl,,<option> and not -Wl,<option>

Thanks Paul Hargrove for the report
2016-06-06 11:54:12 +09:00
Gilles Gouaillardet
c976559877 coll/basic: fix log basic bcast
The log basic bcast was completely broken. The rank 0 gets the
hibit set to -1, so it always returned an error.
2016-06-06 11:01:51 +09:00
Gilles Gouaillardet
99fedcb7a3 fs/base: silence a memory leak in mca_fs_base_get_fstype()
Fixes CID 1351211
2016-06-06 09:20:14 +09:00
George Bosilca
9376b0340b Fix the basic barrier.
The log basic barrier was completely broken. The rank 0 gets the
hibit set to 0, so it always returned an error.
2016-06-03 23:46:25 -04:00
Edgar Gabriel
d6af5444a6 fix the get_byte_offset code 2016-06-03 11:36:53 -05:00
Josh Hursey
9f9f70ee50 Merge pull request #1746 from jjhursey/topic/op-init
ompi/op: Provide a default value for type/flags
2016-06-03 07:56:29 -05:00
Nathan Hjelm
e968ddfe64 start bug fixes (#1729)
* mpi/start: fix bugs in cm and ob1 start functions

There were several problems with the implementation of start in Open
MPI:

 - There are no checks whatsoever on the state of the request(s)
   provided to MPI_Start/MPI_Start_all. It is erroneous to provide an
   active request to either of these calls. Since we are already
   looping over the provided requests there is little overhead in
   verifying that the request can be started.

 - Both ob1 and cm were always throwing away the request on the
   initial call to start and start_all with a particular
   request. Subsequent calls would see that the request was
   pml_complete and reuse it. This introduced a leak as the initial
   request was never freed. Since the only pml request that can
   be mpi complete but not pml complete is a buffered send the
   code to reallocate the request has been moved. To detect that
   a request is indeed mpi complete but not pml complete isend_init
   in both cm and ob1 now marks the new request as pml complete.

 - If a new request was needed the callbacks on the original request
   were not copied over to the new request. This can cause osc/pt2pt
   to hang as the incoming message callback is never called.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>

* osc/pt2pt: add request for gc after starting a new request

Starting a new receive may cause a recursive call into the pt2pt
frag receive function. If this happens and the prior request is
on the garbage collection list it could cause problems. This commit
moves the gc insert until after the new request has been posted.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-06-02 20:22:40 -04:00
Matias A Cabral
29ab28f4f6 Adding owner.txt file for PSM2 MTL. 2016-06-02 16:26:16 -07:00
Joshua Hursey
a776d78f2d ompi/op: Provide a default value for type/flags
* User defined ops leave the op_type unset which can confuse logic
   in a collective component that is trying to convert the op to the
   approprate local function.
2016-06-02 13:59:04 -05:00
George Bosilca
d577e12dd0 Fix comment. 2016-06-03 00:57:31 +09:00
George Bosilca
fc5d458249 Consistency in handling OPAL_ENABLE_FT_CR.
I am not sure if we should continue to maintain the request support
for FT_CR, but I tried here to simplify the code while maintaining
the same meaning.
2016-06-03 00:54:24 +09:00
Nathan Hjelm
b001184e63 request: fix warnings (#1742)
Fix warnings introduced by request rework.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-06-02 04:53:16 -04:00
George Bosilca
bfcf145613 Refactor the request test and wait functions. 2016-06-02 11:58:25 +09:00
George Bosilca
2e1b1d34c6 Safety first ! 2016-06-02 11:52:43 +09:00
George Bosilca
50cec456fb ompi_request_complete with signal
Rewrite the ompi_request_complete function to take in account the
with_signal argument. Change the comment to explain the expected
behavior.
Alter all the ompi_request_complete uses to make sure the status of the
request is set before calling ompi_request_complete.

bot🏷️enhancement
2016-06-02 11:49:12 +09:00
George Bosilca
223d75595d Give a boost to MPI_Barrier.
Based on current implementation it is faster to use a blocking
send than the non-blocking version. Switch the exchange function
used in the barrier to use the blocking version combined with
the non-blocking version of the receive.
2016-06-02 11:45:25 +09:00
Ralph Castain
2c086e56be Add an experimental ability to skip the RTE barriers at the end of MPI_Init and the beginning of MPI_Finalize 2016-06-01 17:01:15 -07:00
Nathan Hjelm
086ffc1838 pml/ob1: fix race on pml completion of send requests
The request code was setting the request as pml_complete before
calling MCA_PML_OB1_SEND_REQUEST_MPI_COMPLETE. This was causing
MCA_PML_OB1_SEND_REQUEST_RETURN to be called twice in some cases. The
code now mirrors the recvreq code and only sets the request as pml
complete if the request has not already been freed.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-01 13:36:06 -06:00
Gilles Gouaillardet
5f565dfec3 configury: clean the flex generated .c files 2016-06-01 11:13:31 +09:00
Gilles Gouaillardet
1bbc5fadee ompi/win: silence an other warning 2016-05-31 13:18:39 +09:00
Gilles Gouaillardet
c41321b9e5 ompi/win: silence warning 2016-05-31 13:03:20 +09:00
Jeff Squyres
59f4a765b3 Merge pull request #1656 from hpcraink/pr/make_manpage
In case, we do not build Fortran, Fortran 2008 or CXX, the regexp in …
2016-05-28 11:02:12 -04:00
Nathan Hjelm
d8fd3a411a Merge pull request #1725 from hjelmn/request_fixes
ompi/request: fix bugs in MPI_Wait_some and MPI_Wait_any
2016-05-27 13:47:49 -06:00
Nathan Hjelm
0591139f49 ompi/request: fix bugs in MPI_Wait_some and MPI_Wait_any
This commit fixes two bugs in MPI_Wait_any:

 - If all requests are inactive then the sync wait would hang forever
   because no requests are attached to the sync.

 - The request pointer was pointing to the request before the completed
   request which caused the wrong request to be freed or marked inactive.

MPI_Wait_some had a similar issue if all the requests were pending.

These issues were identified by MTT.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-27 12:36:10 -06:00
Nathan Hjelm
0adfb328e1 win: fix warnings
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-05-27 10:14:02 -06:00
Thananon Patinyasakdikul
60d0fbf683 Removal of ompi_request_lock from pml/ucx. 2016-05-26 12:36:58 -04:00
George Bosilca
90f294096e Remove more references to the request mutex.
Regarding BFO it should be mentionned that this component is currently
unmaintained, and that despite my efforts I could not make it compile
(it would not compile before this patch either).
2016-05-25 23:27:06 -04:00
Nathan Hjelm
9d439664f0 pml/yalla: update for request changes
This commit brings the pml/yalla component up to date with the request
rework changes.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-25 15:42:53 -06:00
Nathan Hjelm
8445c885ce pml/cm: update for request changes
This fixes a hang caused by the request refactor work. The cm pml was
not updated and was hanging is most cases.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-25 15:35:32 -06:00