1
1

29196 Коммитов

Автор SHA1 Сообщение Дата
Geoff Paulsen
bd2990f502
Merge pull request #6131 from devreal/rdma-plug-memleak-v4.0.x
v4.0.x: Plug two memory leaks in rdma osc
2018-11-30 13:54:51 -06:00
Geoff Paulsen
51d20915fd
Merge pull request #6139 from rhc54/cmr40/rmap
v4.0.x: Fix typo for rmaps_base_oversubscribe
2018-11-30 13:51:36 -06:00
Geoff Paulsen
937cf86077
Merge pull request #6135 from jsquyres/pr/v4.0.x/README-typo-fix
v4.0.x: README: Fix a typo
2018-11-29 14:05:08 -06:00
Ralph Castain
98c8492057 Fix typo for rmaps_base_oversubscribe
Causes the MCA param to be ignored, while the cmd line option still
works.

Thanks to @iassiour for the report!

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-11-29 07:40:51 -08:00
Joseph Schuchart
c5346751e6 Plug two memory leaks in rdma osc
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 91885f5876129aa4fb43ed4b3404c9d1ca7e08b8)
2018-11-29 10:19:26 -05:00
Jeff Squyres
e56c179d58 README: Fix a typo
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit e6241eaf6ba6bffeb6b85def68e420a7ab66dce8)
2018-11-28 14:52:08 -08:00
Howard Pritchard
7fc0841791
Merge pull request #6117 from yosefe/topic/pml-ucx-init-req_mpi_object-v4.0.x
pml_ucx: initialize req_mpi_object.comm for error handler
2018-11-26 13:30:53 -07:00
Howard Pritchard
176206fe8c
Merge pull request #6098 from jjhursey/enh/v4.0.x/vpid-unpack
Add OPAL_VPID to unpacking
2018-11-26 13:30:20 -07:00
Yossi Itigin
a112d10c93 pml_ucx: initialize req_mpi_object.comm for error handler
without this fix, an error handler invoked on pml_ucx request would
segfault while trying to dereference requests[i]->req_mpi_object.comm

(picked from master f36eeef)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-11-26 11:57:34 +02:00
Joshua Hursey
e1f75d5ff1 Add OPAL_VPID to unpacking
* Needed to properly read PMIx job data like the following
   - `OPAL_PMIX_LOCALLDR`
   - `OPAL_PMIX_RANK`
   - `OPAL_PMIX_GLOBAL_RANK`
   - `OPAL_PMIX_APPLDR`
   - `OPAL_PMIX_APP_RANK`

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit a557c4130c42a5a41aba5c08e606e7129d0bcb6d)
2018-11-21 11:48:58 -06:00
Geoff Paulsen
206b574bb3
Merge pull request #6076 from hppritcha/topic/on_to_v4.0.1
roll to v4.0.1a1
2018-11-20 09:55:55 -06:00
Geoff Paulsen
6898de01a7
Merge pull request #6012 from hoopoepg/topic/added-missing-amo-datatypes-v4.0
OSHMEM/AMO: added missing C11 macro datatypes - v4.0
2018-11-19 14:20:15 -06:00
Howard Pritchard
8adaeb1536
Merge pull request #6007 from aravindksg/coll-tuned-fix-40x
coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms
2018-11-19 13:15:40 -07:00
Howard Pritchard
3369b0d10f
Merge pull request #6011 from hoopoepg/topic/fixed-oshmem-profile-build-v4.0
OSHMEM/PROFILE: fixed profile build - v4.0
2018-11-19 13:15:09 -07:00
Howard Pritchard
24dae8609e
Merge pull request #5926 from hjelmn/v4.0.x_need_to_unblock_sigchld_in_some_cases
v4.0.x: Ensure SIGCHLD is unblocked
2018-11-19 13:13:10 -07:00
Howard Pritchard
d6bab4e26d
Merge pull request #6064 from jsquyres/pr/v4.0.x/rmaps-help-message-update
v4.0.x: orte-rmaps-base: update out-of-slots show_help message
2018-11-19 13:12:41 -07:00
Howard Pritchard
ec79631ba2
Merge pull request #5936 from edgargabriel/pr/testmpio-v4.0.x
Pr/testmpio v4.0.x
2018-11-19 13:11:50 -07:00
Howard Pritchard
116a140be8 roll to v4.0.1a1
long live 4.0.1!

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-11-13 07:18:57 -07:00
Howard Pritchard
725f62554e
Merge pull request #6067 from jsquyres/pr/v4.0.x/fix-readme-lsf-references
v4.0.x: README: Make LSF text more accurate
2018-11-10 14:47:40 -07:00
Jeff Squyres
c6d8caf302 README: Make LSF text more accurate
Also remove a now-outdated LSF reference.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 419852ab433e578a790e8882711a20f2f570f0a2)
2018-11-08 18:32:34 -05:00
Jeff Squyres
8be14b9b07 orte-rmaps-base: slightly amend help message
Follow on to 430c659908: clarify the help message and fix one typo.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit e9bf318dcb2f337267211f37e6d59c9f8bf5d8be)
2018-11-08 18:20:28 -05:00
Howard Pritchard
e47c3eaa8d
Merge pull request #6003 from yosefe/topic/scoll-basic-fix-pSync-v4.0.x
SCOLL/BASIC: Fix invalid pSync pointer passed to barrier func
2018-11-08 15:45:39 -07:00
Jeff Squyres
76d4c1843e orte-rmaps-base: update out-of-slots show_help message
Update the show_help message for when there are not enough slots to
run an application.

Also, remove a bunch of copies of this message in various show_help
text files that aren't used/referred to anywhere in the code.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 430c659908f9c1ba1ff652379a694314718ff3d8)
2018-11-08 16:03:28 -05:00
Howard Pritchard
9275b692b7
Merge pull request #6043 from hjelmn/v4.0.x_fix_this_damn_memory_barrier_bug_that_is_referenced_in_github_bug_6014_now_lets_get_this_release_out_the_door
v4.0.x: opal/asm: work around possible gcc compiler bug
2018-11-08 08:01:33 -07:00
Geoff Paulsen
d99518c0d4
Merge pull request #6046 from hjelmn/v4.0.x_fix_a_memory_barrier_bug_that_is_totally_related_to_6014_but_in_the_pmix_code
pmix3x: fix potential memory barrier bug with __atomic builtin atomics
2018-11-07 10:46:35 -06:00
Geoff Paulsen
f17dcd5961
Merge pull request #6027 from jsquyres/pr/v4.0.0-text-updates
v4.0.0: text updates
2018-11-07 10:45:34 -06:00
Jeff Squyres
9a7320fdab README: Clarify that only IB->openib is deprecated
Per feedback from https://github.com/open-mpi/ompi/pull/6028, remove
"+ob1" from the sentence to emphasize that it's only IB usage through
openib that is deprecated/superceded (i.e., ob1 is definitely not
deprecated).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 6cb415982615c504b97748d8c4007a352edd7246)
2018-11-06 10:07:32 -08:00
Jeff Squyres
7cb6cbc80f README: More updates for v4.0.0
Move the UCX and MXM text up to flow better with the rest of the
text+content.  Also emphasize that MXM is deprecated.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 4ec8e6fe2250f67b8a2213f0699a9ee18c3d1a91)
2018-11-06 10:07:15 -08:00
Jeff Squyres
740567ff92 README: Add extensive information about deleted MPI-1 syms
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit e2ab41efac01c965a205cf020844d7ff5cc54de7)
2018-11-06 10:07:05 -08:00
Jeff Squyres
f149f64f7e README: Update information about UCX
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 78552e81c1b66a2a3e0e4c27e5c9994c4b6ed52f)
2018-11-06 10:07:05 -08:00
Jeff Squyres
d0efdfd9c8 MPI_Type_get_envelope: remove MPI-1 deleted names
Several names are now no longer returned by MPI_Type_get_envelope.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 65eb118e087b0bdaa9c92a12eba151eb30994590)
2018-11-06 10:07:05 -08:00
Nathan Hjelm
5efc76ef44 pmix3x: fix potential memory barrier bug with __atomic builtin atomics
See open-mpi/ompi#6014 for more information.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-11-06 10:37:14 -07:00
Nathan Hjelm
e57c3fb3c9 opal/asm: work around possible gcc compiler bug
It seems in some cases (gcc older than v6.0.0) the __atomic_thread_fence is a
no-op with __ATOMIC_ACQUIRE. This appears to be the case with X86_64 so go
ahead and use __ATOMIC_SEQ_CST for the x86_64 read memory barrier. This should
not cause any performance issues as it is equivalent to the memory barrier
in the hand-written atomics.

References #6014

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 30119ee339eea086f43e3392352899187a4a73c7)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-11-06 10:28:13 -07:00
Geoff Paulsen
4008c46e84
Merge pull request #6032 from gpaulsen/topic/v4.0.x/README_lsf
README: updating LSF version supported to 9.1.1 or later
2018-11-05 14:16:01 -06:00
Geoffrey Paulsen
f5dbecd5e7 README: updating LSF version supported to 9.1.1 or later
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
(cherry picked from commit 010059589877a4e7985a1c4daa86fd74ea840ab0)
2018-11-05 02:33:42 -06:00
Geoff Paulsen
e720a9d31d
Merge pull request #6018 from gpaulsen/topic/v4.0.x/api_removal_for_v4.0.0
mpi.h: restore some MPI-deprecated items to default builds
2018-11-02 16:17:04 -05:00
Howard Pritchard
949dbb4b25
Merge pull request #6017 from hppritcha/topic/swat_issue5810_4.0.x
btl/openib: fix a problem with ib query
2018-11-02 13:11:58 -06:00
Geoffrey Paulsen
2d3b4bb91a mpi.h: restore some MPI-deprecated items to default builds
Commit 89da9651b inadvertantly #if'ed out both deprecated *and*
removed items from mpi.h.  The intent was only to #if out items that
have been *removed* from the MPI specification and leave all items
that are merely deprecated.

This commit also re-orders the deleted typedef+functions to be in the
same order as they are listed in MPI-3.1 chapter 17, just to make
verifying/checking the code easier.

Note that --enable-mpi1-compatibility can still be used to restore
prototypes for the items that have been removed from the MPI
specification (e.g., MPI_Address()).

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit b03a39d359b019d2d7803d194fd03b2fcdffddce)
2018-11-02 14:07:26 -05:00
Howard Pritchard
bbfde1533b btl/openib: fix a problem with ib query
Under certain circumstances, ibv_exp_query_device was
returning an error due to uninitialized fields in the
extended attributes struct.

Fixes: #5810
Fixes: #5914

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 8126779a354b3e0c720d3e1790f7b936dd5b93b2)
2018-11-02 10:42:17 -06:00
Sergey Oblomov
b416c8afe2 OSHMEM/AMO: code beautify
- added <cr> to split API groups to simplify human processing

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 6e7810208966d73e0a56f74b536aa5c56b9a8d1c)
2018-11-01 16:52:01 +02:00
Sergey Oblomov
d3f08d010c OSHMEM/AMO: added missing C11 macro datatypes
- added signed datatypes for atomic_add calls
- added unsigned datatypes for atomic put/inc/get/fetch calls
- fixed incorrect SHMEM_CTX_DEFAULT macro, added
  external declaration of oshmem_ctx_default variable

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit f63d6da6d733e263385188b5470df195f092d041)
2018-11-01 16:51:56 +02:00
Sergey Oblomov
aaf15a6c17 OSHMEM/PROFILE: fixed profile build
- added missing file to profile makefile
- constants SHMEM_CTX_* are shifted into public header

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 4a3e83780c0303e7e4d0ff92d7ba85d3a2239737)
2018-11-01 16:37:49 +02:00
Howard Pritchard
e851879081
Merge pull request #5994 from rhc54/cmr40/cleanup
Remove stale defunct tools
2018-10-31 13:29:57 -06:00
Aravind Gopalakrishnan
5a74ddb34d coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms
PR #5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.

MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 88d781056f43934a93e16db556b340e72cdd3742)
2018-10-31 11:37:29 -07:00
Yossi Itigin
8a329a797c SCOLL/BASIC: Fix invalid pSync pointer passed to barrier func
mca_scoll_basic_alltoall() passed (pSync + 1) to barrier function, but
the value of _SHMEM_ALLTOALL_SYNC_SIZE is 1, which made the barrier
function use an invalid memory location. In particular, this location
was not initialized to _SHMEM_SYNC_VALUE, which broke the barrier
algorithm and it did not complete: One PE could read 0 from its peer and
assume the peer already started the barrier, and then write 1 to the
peer. Then, the peer entered the barrier and overwrote the 1 with 0, and
then it waited forever to see '1' in its pSync.

Found with shmem_verifier test suite.

(picked from master 6754bf1)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-31 12:22:19 +02:00
Geoff Paulsen
cd1e927e1b
Merge pull request #5995 from gpaulsen/topic/v4.0.x/fix_pgi
COMMON/UCX: added error code to log output
2018-10-30 15:11:37 -05:00
Ralph Castain
ba6ad9fe42 Remove stale defunct tools
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 05ac8fa71c0833eeeaa878b72a31503d361e145e)
2018-10-30 08:51:25 -07:00
Sergey Oblomov
0846c9d112 COMMON/UCX: added error code to log output
Also fixes a PGI compilation error with --enable-debug.

Signed-off-by: Geoff Paulsen <gpaulsen@users.noreply.github.com>
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 1099d5f02327329e0c58d9403e3e0a7f1e1d1920)
2018-10-30 09:55:25 -05:00
Ralph Castain
712ddd326f Remove the stale orte-dvm code
Users should migrate to https://github.com/pmix/prrte

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 1bd772e8ebf66f705537b9a6e1af2b6093ef8471)
2018-10-30 07:54:35 -07:00
Geoff Paulsen
660743cd3a
Merge pull request #5988 from hppritcha/topic/libevent_configure_4.0.x
Topic/libevent configure 4.0.x
2018-10-30 07:49:46 -05:00