1
1

29506 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
fbf7d31fd1 fortran/mpif-h: fix MPI_[I]Alltoallw() binding
- ignore sendcounts, sendispls and sendtypes arguments when MPI_IN_PLACE is used
 - use the right size when an inter-communicator is used.

Thanks Markus Geimer for reporting this.

Refs. open-mpi/ompi#5459

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@cdaed89d04)
2019-07-24 17:10:27 +09:00
Gilles Gouaillardet
aae73d9cf7 fortran/mpif-h: fix C to Fortran error code conversion
- remove incorrect use of OMPI_INT_2_FINT()
 - use homogenous syntax (e.g. c_ierr = PMPI_...())

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@223e6cc537)
2019-07-24 17:10:00 +09:00
Howard Pritchard
667aba9913
Merge pull request #6810 from janjust/v4.0.x
v4.0.x OSC: Reset external request to NULL
2019-07-23 09:05:03 -06:00
Geoff Paulsen
368da00414
Merge pull request #6804 from hppritcha/topic/swat_issue_6785
btl/openib:  fix issue 6785
2019-07-16 10:57:46 -05:00
Geoff Paulsen
507fcc9cef
Merge pull request #6806 from ggouaillardet/topic/v4.0.x/nbc_retain
v4.0.x: retain operation and datatype(s) in non blocking collectives
2019-07-16 10:56:57 -05:00
Tomislav Janjusic
63605fc466 v4.0.x OSC: Reset external request to NULL to avoid double request
completion
Co-authored with Artem Polyakov <artemp@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-07-12 22:49:34 +03:00
Howard Pritchard
71f240f078 btl/openib: fix issue 6785
Commit d7053a3 broke things for the case when Open MPI 4.0.x is built
without UCX support.  Problem was it was trying to partially initialize
the btl to try and delay printing of a help message till wireup.  Well
this sort of doesn't work in all cases.  Rather than keep piling on
changes to support a help message for a BTL that we are deprecating, take
a keep it simple stupid approach.

So, revert most of d7053a3 and instead put the help message back in the
original location, during scan of ports of the available HCAs to check
for whether or not link layer for that port is configured for ethernet or infiniband.
If Open MPI was built with UCX support, don't emit the help message, if
UCX was not linked in, emit the help message.

Verified on a system with connectX5 HCAs configured with two ports configured
for ethernet and two for infiniband.

relates to #6785

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-07-12 08:21:21 -06:00
Gilles Gouaillardet
c9e4240e70 mpi: retain operation and datatype in non blocking collectives
MPI standard states a user MPI_Op and/or user MPI_Datatype can be free'd
after a call to a non blocking collective and before the non-blocking
collective completes.
Retain user (only) MPI_Op and MPI_Datatype when the non blocking call is
invoked, and set a request callback so they are free'd when the MPI_Request
completes.

Thanks Thomas Ponweiser for reporting this

Fixes open-mpi/ompi#2151
Fixes open-mpi/ompi#1304

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@0fe756d416)
2019-07-12 10:27:04 +09:00
Aurelien Bouteiller
9499dcfe41 Manage errors in NBC collective ops
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>

Correctly bubble up errors in NBC collective operations

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>

The error field of requests needs to be rearmed at start, not at create

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>

(cherry picked from commit open-mpi/ompi@65660e5999)
2019-07-12 10:26:08 +09:00
Howard Pritchard
d3a736032a
Merge pull request #6800 from jsquyres/pr/v4.0.x/tcp-oob-active-listeners-fix
v4.0.x: Fix oob_tcp tcp_component_close segfault with active listeners
2019-07-09 09:07:26 -06:00
Orivej Desh
667fe3f3f3 Fix oob_tcp tcp_component_close segfault with active listeners
oob_tcp in non-HNP mode shares libevent event_base with oob_base [1].
orte_oob_base_close calls:
(1) oob_tcp component_shutdown, then
(2) opal_progress_thread_finalize, then
(3) oob_tcp tcp_component_close [2].
opal_progress_thread_finalize calls tracker_destructor [3] that frees the
event_base [4]. If any oob_tcp event listeners are active at this time, oob_tcp
will crash trying to delete them at [5] [6].

This change moves oob_tcp event listener cleanup from component_close to
component_shutdown so that it happens before the event_base is freed.

[1] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_listener.c#L160
[2] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/base/oob_base_frame.c#L95
[3] https://github.com/open-mpi/ompi/blob/v4.0.1/opal/runtime/opal_progress_threads.c#L232
[4] https://github.com/open-mpi/ompi/blob/v4.0.1/opal/runtime/opal_progress_threads.c#L65
[5] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_component.c#L192
[6] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_listener.c#L955

Signed-off-by: Orivej Desh <orivej@gmx.fr>
(cherry picked from commit 78b7e342bd26f493547f750dac842252e7a15143)
2019-07-08 15:14:43 -07:00
Geoff Paulsen
2df46aca54
Merge pull request #6792 from hoopoepg/topic/ucx_maxtag_fix-v4.0
pml/ucx: Fix the max tag and context id values - v4.0
2019-07-08 14:08:37 -05:00
Geoff Paulsen
a7608a02ca
Merge pull request #6787 from rhc54/cmr40/pmix
v4.0.x: Update PMIx to official v3.1.3 release
2019-07-08 14:07:46 -05:00
Nysal Jan K.A
b6da090090 pml/ucx: Fix the max tag and context id values
Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
(cherry picked from commit fe4ef147f81b2ac56661175005de6c330eace690)
2019-07-03 16:38:07 +03:00
Ralph Castain
1d0e0557b9
v4.0.x: Update PMIx to official v3.1.3 release
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-07-02 08:56:49 -07:00
Geoff Paulsen
514e273968
Merge pull request #6770 from devreal/osc_winalloc_err_v4.0.x
OSC rdma win allocate: propagate errors to avoid deadlocks (v4.0.x)
2019-06-28 14:04:12 -05:00
Geoff Paulsen
7f26c6dc41
Merge pull request #6776 from rhc54/cmr40/pmix
Update to PMIx v3.1.3rc4
2019-06-26 13:21:31 -05:00
Howard Pritchard
6424857029
Merge pull request #6634 from jsquyres/pr/v4.0.x/ob1-fixes
v4.0.x: Cherry pick ob1 fixes from master
2019-06-26 10:49:32 -06:00
Howard Pritchard
ba2368bc0a
Merge pull request #6774 from hppritcha/topic/pr_6759_for_v4.0.x
Suggestion to fix division by zero in file view.
2019-06-26 10:48:39 -06:00
Ralph Castain
9d0adbc6bc
Update to track 32-bit support commit
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-06-26 09:31:43 -07:00
Ralph Castain
b353639573
Update to PMIx v3.1.3rc4
Will provide PR to update VERSION to final release once passes MTT

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-06-25 13:45:19 -07:00
Harald Klimach
16e1d74c8f Suggestion to fix division by zero in file view.
In common_ompi_aggregators calc_cost routine:
do not cast the real division to an int intermediately.
This patch removes the obsolete int variable c and assigns
the result of the P_a/P_x division directly to n_as.

With the intermediate int c variable, n_as gets 0 if P_a < P_x,
resulting in a division by 0 when computing n_s.

Signed-off-by: Harald Klimach <harald.klimach@uni-siegen.de>
(cherry picked from commit e222a04ae57e5d09b8559f3c111de1f10a47246a)
2019-06-25 09:29:08 -06:00
Howard Pritchard
3da92363b3
Merge pull request #6765 from rhc54/cmr4/flux
v4.0.x: Fix finalize of flux component
2019-06-24 13:24:23 -06:00
Howard Pritchard
28d300915f
Merge pull request #6725 from bosilca/cherrypick/6683
Cherrypick/6683
2019-06-24 13:24:02 -06:00
Joseph Schuchart
c5cf3432b9 OSC rdma win allocate: synchronize error codes across shared memory group
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 8f27cc26d9845b5b207979b2a4621ef1089d1afb)
2019-06-24 17:49:26 +02:00
Ralph Castain
05fa5845bc
Fix finalize of flux component
Per patches from @SteVwonder and @garlick

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit d4070d5f58f0c65aef89eea5910b202b8402e48b)
2019-06-19 06:00:02 -07:00
Howard Pritchard
73c4aac12d
Merge pull request #6750 from brminich/topic/all2all_linear_sync_fix_v4.0
COLL/BASE: Fix linear sync all2all - v4.0.x
2019-06-17 13:45:52 -06:00
Geoff Paulsen
e01005a68b
Merge pull request #6748 from gpaulsen/topic/v4.0.x/fix_ucx_1.6_issue6640
btl/uct: add support for UCX 1.6.x
2019-06-17 14:07:52 -05:00
Howard Pritchard
cb8dd569ff
Merge pull request #6747 from devreal/rdma-fetchop-local-v4.0.x
OSC rdma: make sure accumulating in shared memory is safe
2019-06-13 18:55:53 -06:00
Mikhail Brinskii
adba7f55f7 COLL/BASE: Fix linear sync all2all
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 79006f4e5a578d32bfa08de7b98e747ae18706f6)
2019-06-09 21:31:19 +03:00
Nathan Hjelm
b5428aaf71 btl/uct: add support for UCX 1.6.x
This commit updates the uct btl to support the v1.6.x release of
UCX. This release breaks API.

Signed-off-by: Nathan Hjelm <hjelmn@cs.unm.edu>
(cherry picked from commit b78066720c3e3299bd76f2e22d2c0e415db572fc)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-06-07 15:54:47 -05:00
Geoff Paulsen
0cd5a5afb8
Merge pull request #6714 from rhc54/cmr40/routed
Fix tree spawn at scale
2019-06-07 14:13:22 -05:00
Geoff Paulsen
630af1099e
Merge pull request #6739 from ggouaillardet/topic/regx_atoi
v4.0.x: regx/base: fix an integer overflow
2019-06-07 14:07:23 -05:00
Geoff Paulsen
105bfed176
Merge pull request #6741 from open-mpi/smiller_shmem_wait_types_v4.0.x
v4.0.x: shmem/c: Fix shmem type for calls to shmem_test and shmem_wait_until …
2019-06-07 14:06:28 -05:00
Geoff Paulsen
07b97bf7fe
Merge pull request #6745 from yanagibashi/pr/v4.0.x/add-f08-procedure-names
v4.0.x: mpiext/pcollreq: Add `_f08` to procedure names
2019-06-07 14:04:47 -05:00
Joseph Schuchart
900f0fa21f OSC rdma: make sure accumulating in shared memory is safe
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit c67e2291937a09947c421dc84c6b3a8d07bec07f)
2019-06-07 12:45:00 +02:00
Tsubasa Yanagibashi
5dd8830dca mpiext/pcollreq: Add _f08 to procedure names
The procedure names don't contain "_f08" of Fortran 2008 bindings of
Persistent Collective Operations(mpiext/pcollreq/use-mpi-f08).
This fix adds "_f08" to the procedure names of pcollreq/use-mpi-f08,
same as other Fortran 2008 routines in `ompi/mpi/fortran/use-mpi-f08/mod`.

Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
(cherry picked from commit 3148b0cfaa04843e7219acb8c7e04f43f6d219fe)
2019-06-07 10:59:01 +09:00
Howard Pritchard
a42977f1c2
Merge pull request #6707 from hoopoepg/topic/alloc-with-hint-realloc-inplace-v4.0
ALLOC_WITH_HINT: added inplace realloc - v4.0
2019-06-06 10:16:57 -07:00
perrynzhou
5acaf006ae regx/base: fix an integer overflow
use strtol() instead of atoi() in order to handle hostnames
containing a large number.

This is a one-off commit for the release branches since
the regx framework has already been removed from master.

Refs. open-mpi/ompi#6729

Signed-off-by: perrynzhou <perrynzhou@gmail.com>
2019-06-06 14:37:33 +09:00
Geoff Paulsen
bd602cc3a0
Merge pull request #6701 from hoopoepg/topic/sshmem-mpi-coll-collect-v4.0
SSHMEM/COLL: added sshmem/mpi implementation for shmem_collect call - v4.0
2019-06-05 13:02:43 -05:00
Scott Miller
e6e09c6cba shmem/c: Fix shmem type for calls to shmem_test and shmem_wait_until with [u]int32_t and [u]int64_t
Signed-off-by: Scott Miller <scott.miller1@ibm.com>
(cherry picked from commit ca59cabc679ebdf1decdcf75f3da0766b35a34f7)
2019-06-05 13:42:39 -04:00
Ralph Castain
e07f127576
Ignore generated file
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-06-04 09:50:46 -07:00
Ralph Castain
6c2cd10d68
Fix tree spawn at scale
Remove the debruijn component as it changes the daemon's parent
process ID, thus breaking the other routed components

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-06-04 09:49:01 -07:00
Geoff Paulsen
18f10377eb
Merge pull request #6152 from ggouaillardet/topic/v4.0.x/ucx_warning
btl/openib: delay UCX warning to add_procs()
2019-06-03 15:09:43 -05:00
Geoff Paulsen
a04f5f0c70
Merge pull request #6692 from vspetrov/v4.0.x
V4.0.x Coll/hcoll: don't init opal memhooks unless explicitely requested
2019-06-03 15:00:36 -05:00
Howard Pritchard
6c74d4031b
Merge pull request #6720 from markalle/patcher_additions_v40x
shmat/shmdt additions for patcher
2019-06-03 12:51:05 -07:00
Howard Pritchard
76f01b9b8e
Merge pull request #6696 from gpaulsen/topic/v4.0.x/btl_uct_from_6668
btl/uct: check for support before disabling UCX memory hooks
2019-06-03 12:15:40 -07:00
George Bosilca
4083800c18
Use the correct counter name in the example.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-31 00:19:14 -04:00
George Bosilca
a8d5da67db
Fix the man pages for some of the MPI_T_* functions.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-31 00:19:14 -04:00
George Bosilca
dbf89404d7
Fix the SPC initialization.
Use the PVAR ctx to save the SPC index, so that no lookup nor
restriction on the SPC vars position is imposed.
Make sure the PVAR are always registered.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-31 00:19:14 -04:00