1
1

29713 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
41ef5c7a10
Merge pull request #6594 from vspetrov/osc_ucx_rget_rkey_fix
OSC/UCX: use correct rkey for atomic_fadd in rget/rput
2019-05-01 11:53:17 -06:00
Howard Pritchard
ca5d58f955
Merge pull request #6615 from markalle/merge_v40x_romio
fixing an unsafe usage of integer disps[] (romio321 gpfs)
2019-05-01 11:51:53 -06:00
Gilles Gouaillardet
e2638dbbf2 mpi: mark MPI_COMBINER_{HVECTOR,HINDEXED,STRUCT}_INTEGER removed
unless configure'd with --enable-mpi1-compatibility

This is a one-off commit for the v4.0.x branch since these symbols were
simply removed from master.

Thanks Lisandro Dalcin for reporting this.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-05-01 10:50:57 +09:00
Howard Pritchard
888d014590
Merge pull request #6624 from mwheinz/v4.0.x
v4.0.x: make-authors.pl script not compatible with being a submodule.
2019-04-30 07:55:24 -06:00
Michael Heinz
191c7f01a2 make-authors.pl script not compatible with being a submodule.
make-authors.pl checks that .git exists and is a directory before
getting the git log - but when a repo is checked out as a submodule of a
larger repository, .git is not a directory, it's just a text file.  This
can cause make-authors.pl to terminate inappropriately.

Author: Michael Heinz <michael.william.heinz@intel.com>
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
(cherry picked from commit 0a8fa5439c626c01a68fd9cebda8f00597500f51)
2019-04-29 10:14:53 -04:00
Mark Allen
c081757462 fixing an unsafe usage of integer disps[] (romio321 gpfs)
There are a couple MPI_Alltoallv calls in ad_gpfs_aggrs.c where the
send/recv data comes from places like req[r].lens, and the send
buffer and send displacements for example were being calculated as
    sbuf = pick one of the reqs: req[bottom].lens
    sdisps[r] = req[r].lens - req[bottom].lens
which might be okay if the .lens was data inside of req[] so they'd
all be close to each other. But each .lens field is just a pointer
that's malloced, so those addresses can be all over the place, so the
integer-sized sdisps[] isn't safe.

I changed it to have a new extra array sbuf and rbuf for those two
Alltoallv calls, and copied the data into the sbuf from the same
locations it used to be setting up the sdisps[] at, and after the
Alltoallv I copy the data out of the new rbuf into the same
locations it used to be setting up the rdisps[] at.

For what it's worth I was able to get this to fail -np 2 on a GPFS
filesystem with hints romio_cb_write enable. I didn't whittle the
test down to something small, but it was failing in an
MPI_File_write_all call.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit d85cac8f1a11495415b67ecab69d2ae1cd19d155)
2019-04-25 14:22:19 -04:00
Howard Pritchard
f3edfaa2ac
Merge pull request #6610 from jsquyres/pr/ob1-get-frag-fail-fix
v4.0.x: ob1 get_frag fail fix
2019-04-23 09:27:08 -06:00
Brelle Emmanuel
2a4bc0cb58 pml/ob1: fixed exit from get_frag_fail when falling back on btl_put
In the case the btl_get fails Ob1 tries to fallback on btl_put first but
the return code was ignored. So the code fell back on both btl_put and
btl_send.

Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net>
(cherry picked from commit 9c689f2225d29aa152627f39bab841afead254af)
2019-04-22 14:25:34 -07:00
Howard Pritchard
9b73e8a7c0
Merge pull request #6600 from vspetrov/v4.0.x_osc_ucx_no_op_null_addr_handling
OSC/UCX: correctly handle NULL origin addr and MPI_NO_OP
2019-04-18 15:36:13 -06:00
Valentin Petrov
2947ab2dbc OSC/UCX: correctly handle NULL origin addr and MPI_NO_OP
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-04-17 10:35:34 +03:00
Howard Pritchard
9f7b41f588
Merge pull request #6585 from vspetrov/v4.0.x
V4.0.x Fixes the O(N^2) loop in the mca_scoll_mpi_comm_query
2019-04-16 09:10:55 -06:00
Valentin Petrov
68c88e86f2 OSC/UCX: use correct rkey for atomic_fadd in rget/rput
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-04-16 15:24:57 +03:00
Valentin Petrov
281f78c6e4 Fixes the O(N^2) loop in the mca_scoll_mpi_comm_query
The new proc group is created from the "world_group" based on the
      ranks mapping which can be directly taken from proc_name->vpid.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-04-15 08:43:09 +03:00
Howard Pritchard
5d6657ea40
Merge pull request #6582 from jsquyres/pr/v4.0.x/fix-overtake
v4.0.x: pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.
2019-04-10 10:46:09 -06:00
Thananon Patinyasakdikul
5999fdad5a pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.
We missed an assert to check if ALLOW_OVERTAKE is set or not before
validating the sequence number and this will cause deadlock.

Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
(cherry picked from commit 0263456cf4e99efc67d38acd100cf948e0399d63)
2019-04-09 11:24:24 -07:00
Geoff Paulsen
811dfc63e0
Merge pull request #6550 from rhc54/cmr402/clnup
v4.0.x: Cleanup race condition in finalize that leads to incomplete vader cleanup
2019-04-09 10:13:15 -05:00
Geoff Paulsen
b3492090c2
Merge pull request #6561 from benmenadue/fix-scoll-fca
v4.0.x: scoll/fca: add missing argument to call to original broadcast
2019-04-08 14:05:47 -05:00
Geoff Paulsen
db1cb3002f
Merge pull request #6564 from benmenadue/v4.0.x-fix-shmem-context
v4.0.x: add missing #include to oshmem/shmem/c/shmem_context.c.
2019-04-08 14:00:04 -05:00
Mark Allen
36583df689 in-place conversion macro writes into INPUT argument
In fint_2_int.h there are some conversion macros for logicals. It has
one path for OMPI_SIZEOF_FORTRAN_LOGICAL != SIZEOF_INT where a new array
would be allocated and the conversions then might expand to
    c_array[i] = (array[i] == 0 ? 0 : 1)
and another path for OMPI_SIZEOF_FORTRAN_LOGICAL == SIZEOF_INT where it
does things "in place", so the same conversion there would just be
    array[i] = (array[i] == 0 ? 0 : 1)

The problem is some of the logical arrays being converted are INPUT
arguments. And it's possible for some compilers to even put the argument
in read-only memory so the above "in place" conversion SEGV's.  A
testcase I have used
    call MPI_CART_SUB(oldcomm, (/.true.,.false./), newcomm, ierr)
and gfortran put the second arg in read-only mem.

In cart_sub_f.c you can trace the ompi_fortran_logical_t *remain_dims arg.
remain_dims[] is for input only, but the file uses
    OMPI_LOGICAL_ARRAY_NAME_DECL(remain_dims);
    OMPI_ARRAY_LOGICAL_2_INT(remain_dims, ndims);
    PMPI_Cart_sub(..., OMPI_LOGICAL_ARRAY_NAME_CONVERT(remain_dims), ...);
    OMPI_ARRAY_INT_2_LOGICAL(remain_dims, ndims);
to convert it to c-ints make a C call then restore it to Fortran logicals
before returning.

It's not always wrong to convert purely in-place, eg cart_get_f.c has
a periods[] that's exclusively for OUTPUT and it would be fine with the
macros as they were. But I still say the macros are invalid because they
don't distinguish whether they're being used on INPUT or OUTPUT args and
thus they can't be used in a way that's legal for both cases.

It might be possible to fix the macros by adding more of them so that
cart_create_f.c and cart_get_f.c would use different macros that give
more context. But my fix here is just to turn off the first block and
make all paths run as if OMPI_SIZEOF_FORTRAN_LOGICAL != SIZEOF_INT.

The main macros that get enlarged by this change are
    define OMPI_ARRAY_LOGICAL_2_INT_ALLOC : mallocs now
    define OMPI_ARRAY_LOGICAL_2_INT : also mallocs now
But these are only used in 4 places, three of which are the purpose of
this checkin, to avoid the former in-place expansion of an INPUT arg:
    cart_create_f.c
    cart_map_f.c
    cart_sub_f.c
and one of which is an OUPUT arg that was fine and that gets
unnecessarily expanded into a separate array by this checkin.
    cart_get_f.c

So I think an unnecessary malloc in cart_get_f.c is the only downside
to this change, where the logicals array argument could have been used
and converted in place.

Signed-off-by: Mark Allen <markalle@us.ibm.com>

Update provided by Gilles Gouaillardet to keep the in-place option
if OMPI_FORTRAN_VALUE_TRUE == 1 where no conversion is needed.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 0a7f1e3cc58dabe536df00ae9c97f7e9d27103ad)
2019-04-05 13:34:09 -04:00
Howard Pritchard
702199f39e
Merge pull request #6545 from bertwesarg/v4.0.x-fix-cpp-condition
Fix use of bitwise operation in CPP condition (v4.0.x)
2019-04-05 07:58:09 -06:00
Ben Menadue
173192a6f4 Add missing #include to oshmem/shmem/c/shmem_context.c.
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>
(cherry picked from commit 063596b82837cf0b07926e2c696cd8f48a59143d)
2019-04-03 16:02:58 +11:00
Ben Menadue
001fa5b6ce Add missing nlong_type parameter to call to original broadcast in scoll/fca broadcast.
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>
2019-04-03 14:01:41 +11:00
Howard Pritchard
8261cdab06
Merge pull request #6554 from James-A-Clark/v4.0.x
Add a compilation flag that adds unwind info to all files that are present in the stack starting from MPI_Init (v4.0.x cherry pick)
2019-04-02 09:05:55 -06:00
Howard Pritchard
976cc1e07f
Merge pull request #6509 from janjust/oshmem-multiple-contexts-v4.0.x
v4.0.x: Oshmem multiple contexts
2019-04-01 13:15:38 -06:00
Howard Pritchard
b11cb23b71
Merge pull request #6519 from sam6258/int4_cswap_fix_v4.0.x
v4.0.x: shmem/fortran: Fix invalid datatype size in call to atomic cswap
2019-04-01 13:09:17 -06:00
James Clark
d8dc69feb5 Add a compilation flag that adds unwind info to all files that are present in the stack starting from MPI_Init.
This is so when a debugger attaches using MPIR, it can step out of this stack back into main.
This cannot be done with certain aggressive optimisations and missing debug information.

Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

Co-authored-by: Jeff Squyres <jsquyres@cisco.com>

(cherry-picked from 20f5840)
2019-04-01 11:10:04 +01:00
Ralph Castain
2536b4f869 Remove stale ORTE code
Functionality moved to PMIx

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit cfdd08d309d9ebc48229f0ca68ceec64a7e6389f)
2019-03-31 11:26:18 -07:00
Ralph Castain
861016c3b2 Cleanup race condition in finalize
See https://github.com/open-mpi/ompi/issues/5798#issuecomment-426545893
for a lengthy explanation

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 57f6b94fa53166bc4d513be4507382e832a3a8c7)
2019-03-31 11:23:27 -07:00
Bert Wesarg
7f65e5b720 Fix use of bitwise operation in CPP condition
Signed-off-by: Bert Wesarg <bert.wesarg@tu-dresden.de>
(cherry picked from commit 18525ce39be78ea695ce51c64a6eb443a2dbd899)
2019-03-29 10:17:09 +01:00
Howard Pritchard
9a1b6cfc79
Merge pull request #6529 from hppritcha/topic/roll_to_v4.0.2a1
VERSION: roll to v4.0.2a1
2019-03-27 12:08:19 -06:00
Howard Pritchard
697437169a
Merge pull request #6528 from hppritcha/topic/minor_news_typo
NEWS: minor typo fix
2019-03-27 12:07:55 -06:00
Howard Pritchard
9e73e3e520 VERSION: roll to v4.0.2a1
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-03-27 11:20:05 -06:00
Howard Pritchard
812fd4aa2b NEWS: minor typo fix
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-03-27 11:18:29 -06:00
Geoff Paulsen
b8a8ae9394
Merge pull request #6520 from gpaulsen/topic/v4.0.1/README_oops
Describing Issue 6114 with v4.0.0 in README.
2019-03-26 10:18:13 -05:00
Geoffrey Paulsen
176356249c README: Describes the now fixed Issue 6114
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-03-26 10:12:15 -05:00
Scott Miller
5f4f5d45b3 shmem/fortran: Fix invalid datatype size in call to atomic cswap
Signed-off-by: Scott Miller <scott.miller1@ibm.com>
(cherry picked from commit 6b294e064150d26dfc68ec307cf4cd2e40891a1b)
2019-03-25 12:38:04 -04:00
Xin Zhao
69a80fce9f ompi/oshmem/spml/ucx: use lockfree array to optimize spml_ucx_progress/delete oshmem_barrier in shmem_ctx_destroy
ompi/oshmem/spml/ucx: optimize spml ucx progress

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 9c3d00b144641d2929f830279dcc9d163c38e9e1)
2019-03-21 23:59:58 +02:00
Xin Zhao
580b584179 ompi/oshmem/spml/ucx:delete oob path of getting rkeys in spml ucx
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit e0414006b0c0a8e9918a4cf8ac4bb819b977ec91)
2019-03-21 23:59:46 +02:00
Xin Zhao
596997c194 ompi/oshmem/spml/ucx: defer clean up shmem_ctx to shmem_finalize
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit e1c1ab020227fc18d145379ab29ea86a3cdb66b1)
2019-03-21 23:58:23 +02:00
Geoff Paulsen
97aa434182
Merge pull request #6511 from gpaulsen/topic/v4.0.x/rc3
Update VERSION to v4.0.1rc3
2019-03-21 16:17:57 -05:00
Geoffrey Paulsen
8e04fb3633 Update VERSION to v4.0.1rc3
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-03-21 16:38:51 -04:00
Xin Zhao
ce54b63b90 ompi/oshmem: add spml_context back to sshmem_type in memheap, to keep track of ucx_ctx_default's rkeys
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 48033ac1f43159c053241b65e74a39777e5e31e4)
2019-03-20 23:30:21 +02:00
Xin Zhao
06183a7bec ompi/oshmem/spml/ucx: let shmem_finalize to clean up any ctx left
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 9a060009622e9220d6332d9b63f6a1a7328418a0)
2019-03-20 23:30:09 +02:00
Xin Zhao
91793484ed OMPI/OSHMEM: bug-fix: store mkeys for each oshmem ctx.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 289595e45dc3ebfe5ae1a9dc6f347b1b2d569c4a)
2019-03-20 23:29:53 +02:00
Xin Zhao
f666d75322 ompi/oshmem/spml/ucx: fix eps destroy in shmem_ctx_destroy().
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 79ba7526677bd1641239bb77559a2999c8cd3a4a)
2019-03-20 23:29:38 +02:00
Howard Pritchard
15cfba5347
Merge pull request #6503 from jjhursey/v4x-rm-hash-pmix3
Do not force 'hash' gds on direct modex
2019-03-19 17:58:26 -05:00
Geoff Paulsen
31ebbb2a8d
Merge pull request #6502 from nysal/v4.0.x_spinlock_fix
opal/atomics: Add acquire semantics back for spinlocks
2019-03-19 11:44:46 -05:00
Joshua Hursey
45526fadee Do not force 'hash' gds on direct modex
* Forcing the 'hash' gds component should not be necessary any more.

Port of PR #6498 (component names changed so a cherry-pick would not work)

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2019-03-19 10:52:17 -05:00
Nysal Jan K.A
1329cef213 opal/atomics: Add acquire semantics back for spinlocks
This was introduced in commit 9d0b3fe9

Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
(cherry picked from commit 00f27a80fc63053db1aeb42140148d7a3d1379b3)
2019-03-19 19:45:20 +05:30
Geoff Paulsen
6cb00aa333
Merge pull request #6499 from hppritcha/topic/news_updates_for_4.0.1rc2
NEWS: add a few news items for 4.0.1rc2
2019-03-19 05:22:40 -05:00