1
1
Граф коммитов

29454 Коммитов

Автор SHA1 Сообщение Дата
perrynzhou
5acaf006ae regx/base: fix an integer overflow
use strtol() instead of atoi() in order to handle hostnames
containing a large number.

This is a one-off commit for the release branches since
the regx framework has already been removed from master.

Refs. open-mpi/ompi#6729

Signed-off-by: perrynzhou <perrynzhou@gmail.com>
2019-06-06 14:37:33 +09:00
Geoff Paulsen
18f10377eb
Merge pull request #6152 from ggouaillardet/topic/v4.0.x/ucx_warning
btl/openib: delay UCX warning to add_procs()
2019-06-03 15:09:43 -05:00
Geoff Paulsen
a04f5f0c70
Merge pull request #6692 from vspetrov/v4.0.x
V4.0.x Coll/hcoll: don't init opal memhooks unless explicitely requested
2019-06-03 15:00:36 -05:00
Howard Pritchard
6c74d4031b
Merge pull request #6720 from markalle/patcher_additions_v40x
shmat/shmdt additions for patcher
2019-06-03 12:51:05 -07:00
Howard Pritchard
76f01b9b8e
Merge pull request #6696 from gpaulsen/topic/v4.0.x/btl_uct_from_6668
btl/uct: check for support before disabling UCX memory hooks
2019-06-03 12:15:40 -07:00
Howard Pritchard
3fd5c84a80
Merge pull request #6718 from hoopoepg/topic/pci-flush-on-quiet-v4.0
SPML/UCX: added synchronized flush on quiet - v4.0
2019-05-30 21:11:13 -06:00
Mark Allen
5f79dfaa0a shmat/shmdt additions for patcher
This is mostly based off recent UCX additions to their patcher:
    https://github.com/openucx/ucx/pull/2703

They added triggers for
* mmap when (flags & MAP_FIXED) && (addr != NULL)
* shmat when (shmflg & SHM_REMAP) && (shmaddr != NULL)

Beyond that I noticed they already had a trigger for
* madvise when (advice == MADV_FREE)
that we didn't so I added that.

And the other main thing is we didn't really have shmat/shmdt
active for some systems because we only had a path for
syscall(SYS_shmdt, ) but we needed to also have a path for
syscall(SYS_ipc, IPCOP_shmdt, ) and same for shmat.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit eb888118e8)
2019-05-30 13:31:02 -04:00
Sergey Oblomov
69923e78c7 SPML/UCX: added synchronized flush on quiet
- added synchronized flush operation on quiet call.
- flush is implemented using get operation

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 0b108411f8)
2019-05-30 18:08:33 +03:00
Howard Pritchard
4a7f6a4e2d
Merge pull request #6672 from jsquyres/pr/v4.0.x/adjust-for-slurm-19-cli-options-change
v4.0.x: plm_slurm_module: adjust for new SLURM CLI options
2019-05-30 04:17:17 -06:00
Howard Pritchard
e78851a6c7
Merge pull request #6704 from edgargabriel/pr/v4.0.x-empty-fileview-fix
common/ompio: fix division by zero problem with empty fview
2019-05-26 09:45:52 -06:00
Howard Pritchard
386ed07d54
Merge pull request #6689 from hoopoepg/topic/suppressed-pml-ucx-mt-warning-v4.0
PML/UCX: disable PML UCX if MT is requested but not supported - v4.0
2019-05-26 09:44:05 -06:00
Edgar Gabriel
c7250cd11d common/ompio: fix division by zero problem with empty fview
When using an empty fileview, a division by zero bug can occur in ompio. Not entirely sure why the problem did not show up previously, but some recent changes trigger that bug in one of our tests.

This pr is part of a fix applied in commit f6b3a0a

Fixes Issue #6703

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-05-23 13:48:57 -05:00
Howard Pritchard
16e236d2a8
Merge pull request #6688 from yosefe/topic/osc-ucx-fix-ud-self-deadlock-v4.0.x
OSC/UCX: Fix deadlock with atomic lock - v4.0
2019-05-22 09:39:47 -06:00
Nathan Hjelm
11cb0f24a5 btl/uct: check for support before disabling UCX memory hooks
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
(cherry picked from commit 3e1dd36241)
2019-05-20 16:42:38 -05:00
Valentin Petrov
8f82c899bc Coll/hcoll: don't init opal memhooks unless explicitely requested by user
If user sets HCOLL_EXTERNAL_UCM_EVENTS=1 then we try init opal
    memory framework and register a mem release cb. Otherwise, rely on ucx.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-05-20 14:00:50 +03:00
Sergey Oblomov
1edd36638b PML/UCX: disable PML UCX if MT is requested but not supported
- in case if multithreading requested but not supported
  disable PML UCX

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit a3578d9ece)
2019-05-20 09:59:59 +03:00
Yossi Itigin
4f9fb3e9ce OSC/UCX: Fix deadlock with atomic lock
Atomic lock must progress local worker while obtaining the remote lock,
otherwise an active message which actually releases the lock might not
be processed while polling on local memory location.

(picked from master 9d1994b)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-05-20 09:54:01 +03:00
Geoff Paulsen
c22326e59a
Merge pull request #6652 from yosefe/topic/alloc-with-hint-impl-master-v4.0.x
OSHMEM: Add support for shmemx_malloc_with_hint() - v4.0.x
2019-05-17 15:48:35 -05:00
Geoff Paulsen
5880cb4929
Merge pull request #6661 from brminich/topic/fix_cov_errors_4.0.x
SPML/UCX: Fix coverity error - 4.0.x
2019-05-17 13:52:08 -05:00
Geoff Paulsen
0dc2c7205d
Merge pull request #6663 from jsquyres/pr/v4.0.x/fix-minor-openmpi-specfile-issue
v4.0.x: openmpi.spec: make sure grep failure doesn't abort
2019-05-17 13:49:31 -05:00
Howard Pritchard
81aa9d1413
Merge pull request #6679 from hoopoepg/topic/ucx-common-init-patcher-on-hooks-used-only-v4.0
COMMON/UCX: init memhooks infra on external hooks only - v4.0
2019-05-17 12:40:52 -06:00
Sergey Oblomov
1944295da3 COMMON/UCX: removed ucs stuff
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit ebc457baf5)
2019-05-17 09:58:20 +03:00
Sergey Oblomov
fa0a0b1597 COMMON/UCX: init memhooks infra on external hooks only
- initialize memory hooks infrastructure only in case
  if external memory hooks are requested

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit a0a9306066)
2019-05-17 09:58:12 +03:00
Jordan Hayes
e00d0abe56 plm_slurm_module: adjust for new SLURM CLI options
SLURM 19 discontinued the use of --cpu_bind (and changed it to
--cpu-bind).  There's no easy way to test at run time which one is
accepted, so set the environment variable SLURM_CPU_BIND to "none",
which should do the same thing as the srun CLI parameter.

Signed-off-by: Jordan Hayes <jhayes@ucr.edu>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 7dad74032e)
2019-05-16 09:13:28 -07:00
Yossi Itigin
fbd6798bf8 OSHMEM/MMAP/SYSV: Return ERR_NOT_IMPLEMENTED if segment hint != 0
(picked from master f708674)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-05-15 17:11:24 +03:00
Jeff Squyres
84b3536f61 openmpi.spec: make sure grep failure doesn't abort
Thanks to Daniel Letai for bringing this to our attention.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 013f5b03f5)
2019-05-15 06:54:43 -07:00
Mikhail Brinskii
ff9ecc183f SPML/UCX: Fix coverity error
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit d81dc533f6)
2019-05-15 14:20:05 +03:00
Howard Pritchard
8c9a3d1d1f
Merge pull request #6651 from ggouaillardet/topic/v4.0.x/btl_vader
btl/vader: fix finalize sequence
2019-05-14 09:08:56 -06:00
Sergey Oblomov
e6cb5b02e8 OSHMEM/free: suppressed coverity issue
- removed dead code

(cherry picked from master 4df8c1b)

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-05-13 16:38:57 +03:00
Yossi Itigin
fc41c16134 OSHMEM: Add support for shmemx_malloc_with_hint()
- added multiple segments processing
- added shmemx_malloc_with_hint call + set of hints

(picked from master 94b5e91)

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-05-12 11:42:59 +03:00
George Bosilca
4946570b24 Remove few warnings identified by @rhc in #5514.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

(cherry picked from commit open-mpi/ompi@6d11a45f44)
2019-05-11 16:38:31 +09:00
Gilles Gouaillardet
70a864fce3 btl/vader: fix finalize sequence
free the component mpool in mca_btl_vader_component_close()
and after freeing soem objects that depend on it such as
mca_btl_vader_component.vader_frags_user

Thanks Christoph Niethammer for reporting this.

Refs. open-mpi/ompi#6524

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@77060cad07)
2019-05-11 13:04:23 +09:00
Geoff Paulsen
5d4c9b444a
Merge pull request #6636 from mwheinz/REF6638-v4.0.x
v4.0.x buildrpm.sh no longer respects the value of rpmtopdir
2019-05-07 09:41:14 -05:00
Geoff Paulsen
73f9bcc374
Merge pull request #6632 from brminich/topic/shmem_all2all_put_4.0.x
SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h 4.0.x
2019-05-07 08:05:01 -05:00
Howard Pritchard
8e968f16a6
Merge pull request #6626 from ggouaillardet/topic/v4.0.x/mpi_combiner_xyz_integer
v4.0.x: mpi: mark MPI_COMBINER_{HVECTOR,HINDEXED,STRUCT}_INTEGER removed
2019-05-04 07:25:40 -06:00
Michael Heinz
77caeb9bfa Corrects some whitespace issues with buildrpm.sh
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
(cherry picked from commit 8562211623)
2019-05-03 09:54:41 -04:00
Michael Heinz
94e842bb34 buildrpm.sh no longer respects the value of rpmtopdir
In OMPI 2.1.2, buildrpm.sh could work with a value of rpmtopdir that was
set in the environment. In newer versions this is no longer true,
causing such values to be ignored. This patch adds a new argument to
buildrpm.sh, -R, which allows the user to specify where to build the
RPMs.

Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
(cherry picked from commit 687a5603a1)
2019-05-03 09:54:33 -04:00
Mikhail Brinskii
6861a68de6 SPML/UCS: CR comments p2
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit d4843b1651)
2019-05-02 21:27:15 +03:00
Mikhail Brinskii
1c56f49a44 SPML/UCX: CR comments p1
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit c4c99457db)
2019-05-02 21:26:55 +03:00
Mikhail Brinskii
e4ee56d1f3 SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h
The new routine transfers the data asynchronously from the source PE to all
PEs in the OpenSHMEM job. The routine returns immediately. The source and
target buffers are reusable only after the completion of the routine.
After the data is transferred to the target buffers, the counter object
is updated atomically. The counter object can be read either using atomic
operations such as shmem_atomic_fetch or can use point-to-point synchronization
routines such as shmem_wait_until and shmem_test.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 2ef5bd8b36)
2019-05-02 21:25:59 +03:00
Howard Pritchard
3cafd02c7f
Merge pull request #6572 from markalle/v40x_fortran_macro
in-place conversion macro writes into INPUT argument
2019-05-01 11:54:12 -06:00
Howard Pritchard
41ef5c7a10
Merge pull request #6594 from vspetrov/osc_ucx_rget_rkey_fix
OSC/UCX: use correct rkey for atomic_fadd in rget/rput
2019-05-01 11:53:17 -06:00
Howard Pritchard
ca5d58f955
Merge pull request #6615 from markalle/merge_v40x_romio
fixing an unsafe usage of integer disps[] (romio321 gpfs)
2019-05-01 11:51:53 -06:00
Gilles Gouaillardet
e2638dbbf2 mpi: mark MPI_COMBINER_{HVECTOR,HINDEXED,STRUCT}_INTEGER removed
unless configure'd with --enable-mpi1-compatibility

This is a one-off commit for the v4.0.x branch since these symbols were
simply removed from master.

Thanks Lisandro Dalcin for reporting this.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-05-01 10:50:57 +09:00
Howard Pritchard
888d014590
Merge pull request #6624 from mwheinz/v4.0.x
v4.0.x: make-authors.pl script not compatible with being a submodule.
2019-04-30 07:55:24 -06:00
Michael Heinz
191c7f01a2 make-authors.pl script not compatible with being a submodule.
make-authors.pl checks that .git exists and is a directory before
getting the git log - but when a repo is checked out as a submodule of a
larger repository, .git is not a directory, it's just a text file.  This
can cause make-authors.pl to terminate inappropriately.

Author: Michael Heinz <michael.william.heinz@intel.com>
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
(cherry picked from commit 0a8fa5439c)
2019-04-29 10:14:53 -04:00
Mark Allen
c081757462 fixing an unsafe usage of integer disps[] (romio321 gpfs)
There are a couple MPI_Alltoallv calls in ad_gpfs_aggrs.c where the
send/recv data comes from places like req[r].lens, and the send
buffer and send displacements for example were being calculated as
    sbuf = pick one of the reqs: req[bottom].lens
    sdisps[r] = req[r].lens - req[bottom].lens
which might be okay if the .lens was data inside of req[] so they'd
all be close to each other. But each .lens field is just a pointer
that's malloced, so those addresses can be all over the place, so the
integer-sized sdisps[] isn't safe.

I changed it to have a new extra array sbuf and rbuf for those two
Alltoallv calls, and copied the data into the sbuf from the same
locations it used to be setting up the sdisps[] at, and after the
Alltoallv I copy the data out of the new rbuf into the same
locations it used to be setting up the rdisps[] at.

For what it's worth I was able to get this to fail -np 2 on a GPFS
filesystem with hints romio_cb_write enable. I didn't whittle the
test down to something small, but it was failing in an
MPI_File_write_all call.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit d85cac8f1a)
2019-04-25 14:22:19 -04:00
Howard Pritchard
f3edfaa2ac
Merge pull request #6610 from jsquyres/pr/ob1-get-frag-fail-fix
v4.0.x: ob1 get_frag fail fix
2019-04-23 09:27:08 -06:00
Brelle Emmanuel
2a4bc0cb58 pml/ob1: fixed exit from get_frag_fail when falling back on btl_put
In the case the btl_get fails Ob1 tries to fallback on btl_put first but
the return code was ignored. So the code fell back on both btl_put and
btl_send.

Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net>
(cherry picked from commit 9c689f2225)
2019-04-22 14:25:34 -07:00
Howard Pritchard
9b73e8a7c0
Merge pull request #6600 from vspetrov/v4.0.x_osc_ucx_no_op_null_addr_handling
OSC/UCX: correctly handle NULL origin addr and MPI_NO_OP
2019-04-18 15:36:13 -06:00