Mark Allen
c081757462
fixing an unsafe usage of integer disps[] (romio321 gpfs)
...
There are a couple MPI_Alltoallv calls in ad_gpfs_aggrs.c where the
send/recv data comes from places like req[r].lens, and the send
buffer and send displacements for example were being calculated as
sbuf = pick one of the reqs: req[bottom].lens
sdisps[r] = req[r].lens - req[bottom].lens
which might be okay if the .lens was data inside of req[] so they'd
all be close to each other. But each .lens field is just a pointer
that's malloced, so those addresses can be all over the place, so the
integer-sized sdisps[] isn't safe.
I changed it to have a new extra array sbuf and rbuf for those two
Alltoallv calls, and copied the data into the sbuf from the same
locations it used to be setting up the sdisps[] at, and after the
Alltoallv I copy the data out of the new rbuf into the same
locations it used to be setting up the rdisps[] at.
For what it's worth I was able to get this to fail -np 2 on a GPFS
filesystem with hints romio_cb_write enable. I didn't whittle the
test down to something small, but it was failing in an
MPI_File_write_all call.
Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit d85cac8f1a11495415b67ecab69d2ae1cd19d155)
2019-04-25 14:22:19 -04:00
Howard Pritchard
f3edfaa2ac
Merge pull request #6610 from jsquyres/pr/ob1-get-frag-fail-fix
...
v4.0.x: ob1 get_frag fail fix
2019-04-23 09:27:08 -06:00
Brelle Emmanuel
2a4bc0cb58
pml/ob1: fixed exit from get_frag_fail when falling back on btl_put
...
In the case the btl_get fails Ob1 tries to fallback on btl_put first but
the return code was ignored. So the code fell back on both btl_put and
btl_send.
Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net>
(cherry picked from commit 9c689f2225d29aa152627f39bab841afead254af)
2019-04-22 14:25:34 -07:00
Howard Pritchard
9b73e8a7c0
Merge pull request #6600 from vspetrov/v4.0.x_osc_ucx_no_op_null_addr_handling
...
OSC/UCX: correctly handle NULL origin addr and MPI_NO_OP
2019-04-18 15:36:13 -06:00
Valentin Petrov
2947ab2dbc
OSC/UCX: correctly handle NULL origin addr and MPI_NO_OP
...
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-04-17 10:35:34 +03:00
Howard Pritchard
9f7b41f588
Merge pull request #6585 from vspetrov/v4.0.x
...
V4.0.x Fixes the O(N^2) loop in the mca_scoll_mpi_comm_query
2019-04-16 09:10:55 -06:00
Valentin Petrov
281f78c6e4
Fixes the O(N^2) loop in the mca_scoll_mpi_comm_query
...
The new proc group is created from the "world_group" based on the
ranks mapping which can be directly taken from proc_name->vpid.
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-04-15 08:43:09 +03:00
Howard Pritchard
5d6657ea40
Merge pull request #6582 from jsquyres/pr/v4.0.x/fix-overtake
...
v4.0.x: pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.
2019-04-10 10:46:09 -06:00
Thananon Patinyasakdikul
5999fdad5a
pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.
...
We missed an assert to check if ALLOW_OVERTAKE is set or not before
validating the sequence number and this will cause deadlock.
Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
(cherry picked from commit 0263456cf4e99efc67d38acd100cf948e0399d63)
2019-04-09 11:24:24 -07:00
Geoff Paulsen
811dfc63e0
Merge pull request #6550 from rhc54/cmr402/clnup
...
v4.0.x: Cleanup race condition in finalize that leads to incomplete vader cleanup
2019-04-09 10:13:15 -05:00
Geoff Paulsen
b3492090c2
Merge pull request #6561 from benmenadue/fix-scoll-fca
...
v4.0.x: scoll/fca: add missing argument to call to original broadcast
2019-04-08 14:05:47 -05:00
Geoff Paulsen
db1cb3002f
Merge pull request #6564 from benmenadue/v4.0.x-fix-shmem-context
...
v4.0.x: add missing #include to oshmem/shmem/c/shmem_context.c.
2019-04-08 14:00:04 -05:00
Howard Pritchard
702199f39e
Merge pull request #6545 from bertwesarg/v4.0.x-fix-cpp-condition
...
Fix use of bitwise operation in CPP condition (v4.0.x)
2019-04-05 07:58:09 -06:00
Ben Menadue
173192a6f4
Add missing #include to oshmem/shmem/c/shmem_context.c.
...
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>
(cherry picked from commit 063596b82837cf0b07926e2c696cd8f48a59143d)
2019-04-03 16:02:58 +11:00
Ben Menadue
001fa5b6ce
Add missing nlong_type parameter to call to original broadcast in scoll/fca broadcast.
...
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>
2019-04-03 14:01:41 +11:00
Howard Pritchard
8261cdab06
Merge pull request #6554 from James-A-Clark/v4.0.x
...
Add a compilation flag that adds unwind info to all files that are present in the stack starting from MPI_Init (v4.0.x cherry pick)
2019-04-02 09:05:55 -06:00
Howard Pritchard
976cc1e07f
Merge pull request #6509 from janjust/oshmem-multiple-contexts-v4.0.x
...
v4.0.x: Oshmem multiple contexts
2019-04-01 13:15:38 -06:00
Howard Pritchard
b11cb23b71
Merge pull request #6519 from sam6258/int4_cswap_fix_v4.0.x
...
v4.0.x: shmem/fortran: Fix invalid datatype size in call to atomic cswap
2019-04-01 13:09:17 -06:00
James Clark
d8dc69feb5
Add a compilation flag that adds unwind info to all files that are present in the stack starting from MPI_Init.
...
This is so when a debugger attaches using MPIR, it can step out of this stack back into main.
This cannot be done with certain aggressive optimisations and missing debug information.
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Co-authored-by: Jeff Squyres <jsquyres@cisco.com>
(cherry-picked from 20f5840)
2019-04-01 11:10:04 +01:00
Ralph Castain
2536b4f869
Remove stale ORTE code
...
Functionality moved to PMIx
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit cfdd08d309d9ebc48229f0ca68ceec64a7e6389f)
2019-03-31 11:26:18 -07:00
Ralph Castain
861016c3b2
Cleanup race condition in finalize
...
See https://github.com/open-mpi/ompi/issues/5798#issuecomment-426545893
for a lengthy explanation
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 57f6b94fa53166bc4d513be4507382e832a3a8c7)
2019-03-31 11:23:27 -07:00
Bert Wesarg
7f65e5b720
Fix use of bitwise operation in CPP condition
...
Signed-off-by: Bert Wesarg <bert.wesarg@tu-dresden.de>
(cherry picked from commit 18525ce39be78ea695ce51c64a6eb443a2dbd899)
2019-03-29 10:17:09 +01:00
Howard Pritchard
9a1b6cfc79
Merge pull request #6529 from hppritcha/topic/roll_to_v4.0.2a1
...
VERSION: roll to v4.0.2a1
2019-03-27 12:08:19 -06:00
Howard Pritchard
697437169a
Merge pull request #6528 from hppritcha/topic/minor_news_typo
...
NEWS: minor typo fix
2019-03-27 12:07:55 -06:00
Howard Pritchard
9e73e3e520
VERSION: roll to v4.0.2a1
...
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-03-27 11:20:05 -06:00
Howard Pritchard
812fd4aa2b
NEWS: minor typo fix
...
[skip ci]
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-03-27 11:18:29 -06:00
Geoff Paulsen
b8a8ae9394
Merge pull request #6520 from gpaulsen/topic/v4.0.1/README_oops
...
Describing Issue 6114 with v4.0.0 in README.
2019-03-26 10:18:13 -05:00
Geoffrey Paulsen
176356249c
README: Describes the now fixed Issue 6114
...
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-03-26 10:12:15 -05:00
Scott Miller
5f4f5d45b3
shmem/fortran: Fix invalid datatype size in call to atomic cswap
...
Signed-off-by: Scott Miller <scott.miller1@ibm.com>
(cherry picked from commit 6b294e064150d26dfc68ec307cf4cd2e40891a1b)
2019-03-25 12:38:04 -04:00
Xin Zhao
69a80fce9f
ompi/oshmem/spml/ucx: use lockfree array to optimize spml_ucx_progress/delete oshmem_barrier in shmem_ctx_destroy
...
ompi/oshmem/spml/ucx: optimize spml ucx progress
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 9c3d00b144641d2929f830279dcc9d163c38e9e1)
2019-03-21 23:59:58 +02:00
Xin Zhao
580b584179
ompi/oshmem/spml/ucx:delete oob path of getting rkeys in spml ucx
...
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit e0414006b0c0a8e9918a4cf8ac4bb819b977ec91)
2019-03-21 23:59:46 +02:00
Xin Zhao
596997c194
ompi/oshmem/spml/ucx: defer clean up shmem_ctx to shmem_finalize
...
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit e1c1ab020227fc18d145379ab29ea86a3cdb66b1)
2019-03-21 23:58:23 +02:00
Geoff Paulsen
97aa434182
Merge pull request #6511 from gpaulsen/topic/v4.0.x/rc3
...
Update VERSION to v4.0.1rc3
2019-03-21 16:17:57 -05:00
Geoffrey Paulsen
8e04fb3633
Update VERSION to v4.0.1rc3
...
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-03-21 16:38:51 -04:00
Xin Zhao
ce54b63b90
ompi/oshmem: add spml_context back to sshmem_type in memheap, to keep track of ucx_ctx_default's rkeys
...
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 48033ac1f43159c053241b65e74a39777e5e31e4)
2019-03-20 23:30:21 +02:00
Xin Zhao
06183a7bec
ompi/oshmem/spml/ucx: let shmem_finalize to clean up any ctx left
...
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 9a060009622e9220d6332d9b63f6a1a7328418a0)
2019-03-20 23:30:09 +02:00
Xin Zhao
91793484ed
OMPI/OSHMEM: bug-fix: store mkeys for each oshmem ctx.
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 289595e45dc3ebfe5ae1a9dc6f347b1b2d569c4a)
2019-03-20 23:29:53 +02:00
Xin Zhao
f666d75322
ompi/oshmem/spml/ucx: fix eps destroy in shmem_ctx_destroy().
...
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 79ba7526677bd1641239bb77559a2999c8cd3a4a)
2019-03-20 23:29:38 +02:00
Howard Pritchard
15cfba5347
Merge pull request #6503 from jjhursey/v4x-rm-hash-pmix3
...
Do not force 'hash' gds on direct modex
2019-03-19 17:58:26 -05:00
Geoff Paulsen
31ebbb2a8d
Merge pull request #6502 from nysal/v4.0.x_spinlock_fix
...
opal/atomics: Add acquire semantics back for spinlocks
2019-03-19 11:44:46 -05:00
Joshua Hursey
45526fadee
Do not force 'hash' gds on direct modex
...
* Forcing the 'hash' gds component should not be necessary any more.
Port of PR #6498 (component names changed so a cherry-pick would not work)
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2019-03-19 10:52:17 -05:00
Nysal Jan K.A
1329cef213
opal/atomics: Add acquire semantics back for spinlocks
...
This was introduced in commit 9d0b3fe9
Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
(cherry picked from commit 00f27a80fc63053db1aeb42140148d7a3d1379b3)
2019-03-19 19:45:20 +05:30
Geoff Paulsen
6cb00aa333
Merge pull request #6499 from hppritcha/topic/news_updates_for_4.0.1rc2
...
NEWS: add a few news items for 4.0.1rc2
2019-03-19 05:22:40 -05:00
Howard Pritchard
ce013130cb
NEWS: add a few news items for 4.0.1rc2
...
a little late, but a couple of bullets for the
4.0.1rc2 NEWS.
[skip ci]
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-03-19 04:13:32 -06:00
Geoff Paulsen
efcbc13d2f
Merge pull request #6496 from gpaulsen/v4.0.x
...
Reving to v4.0.1rc2
2019-03-18 16:34:16 -05:00
Geoffrey Paulsen
2ae9a8a3d6
Reving to v4.0.1rc2
...
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-03-18 16:33:26 -05:00
Howard Pritchard
ceb93d7c03
Merge pull request #6491 from bosilca/v4.0.x
...
v4.0.x: Cherry-pick fixes for issue #6258 from master (vader fixes)
2019-03-15 17:08:52 -06:00
Howard Pritchard
27899b0e8f
Merge pull request #6486 from hoopoepg/topic/check-ucx-params-v4.0
...
PML/SPML/UCX: added evaluation of mmap events - v4.0
2019-03-14 17:02:46 -06:00
Howard Pritchard
27c0e95b01
Merge pull request #6489 from markalle/v4.0.x
...
v4.0.x: opal_hwloc_base_cset2str() off-by-1 in its strncat()
2019-03-14 17:00:42 -06:00
Nathan Hjelm
3df8ed9cc0
btl/vader: fix fragment sizes used by free lists
...
This commit fixes a bug introduced in
f62d26ddbc8cda4d985cceee531a2ec32406d1f6. That commit changed how
vader allocates fragment memory from the shared memory
segment. Unfortunately, the values used for the fragment sizes did not
include space for the fragment header. This can cause an overrun of
data from one fragment to the header of the next fragment.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-03-14 17:25:31 -04:00