1
1
Граф коммитов

29403 Коммитов

Автор SHA1 Сообщение Дата
Mark Allen
c081757462 fixing an unsafe usage of integer disps[] (romio321 gpfs)
There are a couple MPI_Alltoallv calls in ad_gpfs_aggrs.c where the
send/recv data comes from places like req[r].lens, and the send
buffer and send displacements for example were being calculated as
    sbuf = pick one of the reqs: req[bottom].lens
    sdisps[r] = req[r].lens - req[bottom].lens
which might be okay if the .lens was data inside of req[] so they'd
all be close to each other. But each .lens field is just a pointer
that's malloced, so those addresses can be all over the place, so the
integer-sized sdisps[] isn't safe.

I changed it to have a new extra array sbuf and rbuf for those two
Alltoallv calls, and copied the data into the sbuf from the same
locations it used to be setting up the sdisps[] at, and after the
Alltoallv I copy the data out of the new rbuf into the same
locations it used to be setting up the rdisps[] at.

For what it's worth I was able to get this to fail -np 2 on a GPFS
filesystem with hints romio_cb_write enable. I didn't whittle the
test down to something small, but it was failing in an
MPI_File_write_all call.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit d85cac8f1a)
2019-04-25 14:22:19 -04:00
Howard Pritchard
f3edfaa2ac
Merge pull request #6610 from jsquyres/pr/ob1-get-frag-fail-fix
v4.0.x: ob1 get_frag fail fix
2019-04-23 09:27:08 -06:00
Brelle Emmanuel
2a4bc0cb58 pml/ob1: fixed exit from get_frag_fail when falling back on btl_put
In the case the btl_get fails Ob1 tries to fallback on btl_put first but
the return code was ignored. So the code fell back on both btl_put and
btl_send.

Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net>
(cherry picked from commit 9c689f2225)
2019-04-22 14:25:34 -07:00
Howard Pritchard
9b73e8a7c0
Merge pull request #6600 from vspetrov/v4.0.x_osc_ucx_no_op_null_addr_handling
OSC/UCX: correctly handle NULL origin addr and MPI_NO_OP
2019-04-18 15:36:13 -06:00
Valentin Petrov
2947ab2dbc OSC/UCX: correctly handle NULL origin addr and MPI_NO_OP
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-04-17 10:35:34 +03:00
Howard Pritchard
9f7b41f588
Merge pull request #6585 from vspetrov/v4.0.x
V4.0.x Fixes the O(N^2) loop in the mca_scoll_mpi_comm_query
2019-04-16 09:10:55 -06:00
Valentin Petrov
281f78c6e4 Fixes the O(N^2) loop in the mca_scoll_mpi_comm_query
The new proc group is created from the "world_group" based on the
      ranks mapping which can be directly taken from proc_name->vpid.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-04-15 08:43:09 +03:00
Howard Pritchard
5d6657ea40
Merge pull request #6582 from jsquyres/pr/v4.0.x/fix-overtake
v4.0.x: pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.
2019-04-10 10:46:09 -06:00
Thananon Patinyasakdikul
5999fdad5a pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.
We missed an assert to check if ALLOW_OVERTAKE is set or not before
validating the sequence number and this will cause deadlock.

Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
(cherry picked from commit 0263456cf4)
2019-04-09 11:24:24 -07:00
Geoff Paulsen
811dfc63e0
Merge pull request #6550 from rhc54/cmr402/clnup
v4.0.x: Cleanup race condition in finalize that leads to incomplete vader cleanup
2019-04-09 10:13:15 -05:00
Geoff Paulsen
b3492090c2
Merge pull request #6561 from benmenadue/fix-scoll-fca
v4.0.x: scoll/fca: add missing argument to call to original broadcast
2019-04-08 14:05:47 -05:00
Geoff Paulsen
db1cb3002f
Merge pull request #6564 from benmenadue/v4.0.x-fix-shmem-context
v4.0.x: add missing #include to oshmem/shmem/c/shmem_context.c.
2019-04-08 14:00:04 -05:00
Howard Pritchard
702199f39e
Merge pull request #6545 from bertwesarg/v4.0.x-fix-cpp-condition
Fix use of bitwise operation in CPP condition (v4.0.x)
2019-04-05 07:58:09 -06:00
Ben Menadue
173192a6f4 Add missing #include to oshmem/shmem/c/shmem_context.c.
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>
(cherry picked from commit 063596b828)
2019-04-03 16:02:58 +11:00
Ben Menadue
001fa5b6ce Add missing nlong_type parameter to call to original broadcast in scoll/fca broadcast.
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>
2019-04-03 14:01:41 +11:00
Howard Pritchard
8261cdab06
Merge pull request #6554 from James-A-Clark/v4.0.x
Add a compilation flag that adds unwind info to all files that are present in the stack starting from MPI_Init (v4.0.x cherry pick)
2019-04-02 09:05:55 -06:00
Howard Pritchard
976cc1e07f
Merge pull request #6509 from janjust/oshmem-multiple-contexts-v4.0.x
v4.0.x: Oshmem multiple contexts
2019-04-01 13:15:38 -06:00
Howard Pritchard
b11cb23b71
Merge pull request #6519 from sam6258/int4_cswap_fix_v4.0.x
v4.0.x: shmem/fortran: Fix invalid datatype size in call to atomic cswap
2019-04-01 13:09:17 -06:00
James Clark
d8dc69feb5 Add a compilation flag that adds unwind info to all files that are present in the stack starting from MPI_Init.
This is so when a debugger attaches using MPIR, it can step out of this stack back into main.
This cannot be done with certain aggressive optimisations and missing debug information.

Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

Co-authored-by: Jeff Squyres <jsquyres@cisco.com>

(cherry-picked from 20f5840)
2019-04-01 11:10:04 +01:00
Ralph Castain
2536b4f869 Remove stale ORTE code
Functionality moved to PMIx

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit cfdd08d309)
2019-03-31 11:26:18 -07:00
Ralph Castain
861016c3b2 Cleanup race condition in finalize
See https://github.com/open-mpi/ompi/issues/5798#issuecomment-426545893
for a lengthy explanation

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 57f6b94fa5)
2019-03-31 11:23:27 -07:00
Bert Wesarg
7f65e5b720 Fix use of bitwise operation in CPP condition
Signed-off-by: Bert Wesarg <bert.wesarg@tu-dresden.de>
(cherry picked from commit 18525ce39b)
2019-03-29 10:17:09 +01:00
Howard Pritchard
9a1b6cfc79
Merge pull request #6529 from hppritcha/topic/roll_to_v4.0.2a1
VERSION: roll to v4.0.2a1
2019-03-27 12:08:19 -06:00
Howard Pritchard
697437169a
Merge pull request #6528 from hppritcha/topic/minor_news_typo
NEWS: minor typo fix
2019-03-27 12:07:55 -06:00
Howard Pritchard
9e73e3e520 VERSION: roll to v4.0.2a1
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-03-27 11:20:05 -06:00
Howard Pritchard
812fd4aa2b NEWS: minor typo fix
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-03-27 11:18:29 -06:00
Geoff Paulsen
b8a8ae9394
Merge pull request #6520 from gpaulsen/topic/v4.0.1/README_oops
Describing Issue 6114 with v4.0.0 in README.
2019-03-26 10:18:13 -05:00
Geoffrey Paulsen
176356249c README: Describes the now fixed Issue 6114
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-03-26 10:12:15 -05:00
Scott Miller
5f4f5d45b3 shmem/fortran: Fix invalid datatype size in call to atomic cswap
Signed-off-by: Scott Miller <scott.miller1@ibm.com>
(cherry picked from commit 6b294e0641)
2019-03-25 12:38:04 -04:00
Xin Zhao
69a80fce9f ompi/oshmem/spml/ucx: use lockfree array to optimize spml_ucx_progress/delete oshmem_barrier in shmem_ctx_destroy
ompi/oshmem/spml/ucx: optimize spml ucx progress

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 9c3d00b144)
2019-03-21 23:59:58 +02:00
Xin Zhao
580b584179 ompi/oshmem/spml/ucx:delete oob path of getting rkeys in spml ucx
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit e0414006b0)
2019-03-21 23:59:46 +02:00
Xin Zhao
596997c194 ompi/oshmem/spml/ucx: defer clean up shmem_ctx to shmem_finalize
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit e1c1ab0202)
2019-03-21 23:58:23 +02:00
Geoff Paulsen
97aa434182
Merge pull request #6511 from gpaulsen/topic/v4.0.x/rc3
Update VERSION to v4.0.1rc3
2019-03-21 16:17:57 -05:00
Geoffrey Paulsen
8e04fb3633 Update VERSION to v4.0.1rc3
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-03-21 16:38:51 -04:00
Xin Zhao
ce54b63b90 ompi/oshmem: add spml_context back to sshmem_type in memheap, to keep track of ucx_ctx_default's rkeys
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 48033ac1f4)
2019-03-20 23:30:21 +02:00
Xin Zhao
06183a7bec ompi/oshmem/spml/ucx: let shmem_finalize to clean up any ctx left
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 9a06000962)
2019-03-20 23:30:09 +02:00
Xin Zhao
91793484ed OMPI/OSHMEM: bug-fix: store mkeys for each oshmem ctx.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 289595e45d)
2019-03-20 23:29:53 +02:00
Xin Zhao
f666d75322 ompi/oshmem/spml/ucx: fix eps destroy in shmem_ctx_destroy().
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 79ba752667)
2019-03-20 23:29:38 +02:00
Howard Pritchard
15cfba5347
Merge pull request #6503 from jjhursey/v4x-rm-hash-pmix3
Do not force 'hash' gds on direct modex
2019-03-19 17:58:26 -05:00
Geoff Paulsen
31ebbb2a8d
Merge pull request #6502 from nysal/v4.0.x_spinlock_fix
opal/atomics: Add acquire semantics back for spinlocks
2019-03-19 11:44:46 -05:00
Joshua Hursey
45526fadee Do not force 'hash' gds on direct modex
* Forcing the 'hash' gds component should not be necessary any more.

Port of PR #6498 (component names changed so a cherry-pick would not work)

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2019-03-19 10:52:17 -05:00
Nysal Jan K.A
1329cef213 opal/atomics: Add acquire semantics back for spinlocks
This was introduced in commit 9d0b3fe9

Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
(cherry picked from commit 00f27a80fc)
2019-03-19 19:45:20 +05:30
Geoff Paulsen
6cb00aa333
Merge pull request #6499 from hppritcha/topic/news_updates_for_4.0.1rc2
NEWS: add a few news items for 4.0.1rc2
2019-03-19 05:22:40 -05:00
Howard Pritchard
ce013130cb NEWS: add a few news items for 4.0.1rc2
a little late, but a couple of bullets for the
4.0.1rc2 NEWS.

[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-03-19 04:13:32 -06:00
Geoff Paulsen
efcbc13d2f
Merge pull request #6496 from gpaulsen/v4.0.x
Reving to v4.0.1rc2
2019-03-18 16:34:16 -05:00
Geoffrey Paulsen
2ae9a8a3d6 Reving to v4.0.1rc2
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-03-18 16:33:26 -05:00
Howard Pritchard
ceb93d7c03
Merge pull request #6491 from bosilca/v4.0.x
v4.0.x: Cherry-pick fixes for issue #6258 from master (vader fixes)
2019-03-15 17:08:52 -06:00
Howard Pritchard
27899b0e8f
Merge pull request #6486 from hoopoepg/topic/check-ucx-params-v4.0
PML/SPML/UCX: added evaluation of mmap events - v4.0
2019-03-14 17:02:46 -06:00
Howard Pritchard
27c0e95b01
Merge pull request #6489 from markalle/v4.0.x
v4.0.x: opal_hwloc_base_cset2str() off-by-1 in its strncat()
2019-03-14 17:00:42 -06:00
Nathan Hjelm
3df8ed9cc0
btl/vader: fix fragment sizes used by free lists
This commit fixes a bug introduced in
f62d26ddbc. That commit changed how
vader allocates fragment memory from the shared memory
segment. Unfortunately, the values used for the fragment sizes did not
include space for the fragment header. This can cause an overrun of
data from one fragment to the header of the next fragment.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-03-14 17:25:31 -04:00