1
1

29315 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
5efc76ef44 pmix3x: fix potential memory barrier bug with __atomic builtin atomics
See open-mpi/ompi#6014 for more information.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-11-06 10:37:14 -07:00
Nathan Hjelm
e57c3fb3c9 opal/asm: work around possible gcc compiler bug
It seems in some cases (gcc older than v6.0.0) the __atomic_thread_fence is a
no-op with __ATOMIC_ACQUIRE. This appears to be the case with X86_64 so go
ahead and use __ATOMIC_SEQ_CST for the x86_64 read memory barrier. This should
not cause any performance issues as it is equivalent to the memory barrier
in the hand-written atomics.

References #6014

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 30119ee339eea086f43e3392352899187a4a73c7)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-11-06 10:28:13 -07:00
Geoff Paulsen
4008c46e84
Merge pull request #6032 from gpaulsen/topic/v4.0.x/README_lsf
README: updating LSF version supported to 9.1.1 or later
2018-11-05 14:16:01 -06:00
Geoffrey Paulsen
f5dbecd5e7 README: updating LSF version supported to 9.1.1 or later
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
(cherry picked from commit 010059589877a4e7985a1c4daa86fd74ea840ab0)
2018-11-05 02:33:42 -06:00
Geoff Paulsen
e720a9d31d
Merge pull request #6018 from gpaulsen/topic/v4.0.x/api_removal_for_v4.0.0
mpi.h: restore some MPI-deprecated items to default builds
2018-11-02 16:17:04 -05:00
Howard Pritchard
949dbb4b25
Merge pull request #6017 from hppritcha/topic/swat_issue5810_4.0.x
btl/openib: fix a problem with ib query
2018-11-02 13:11:58 -06:00
Geoffrey Paulsen
2d3b4bb91a mpi.h: restore some MPI-deprecated items to default builds
Commit 89da9651b inadvertantly #if'ed out both deprecated *and*
removed items from mpi.h.  The intent was only to #if out items that
have been *removed* from the MPI specification and leave all items
that are merely deprecated.

This commit also re-orders the deleted typedef+functions to be in the
same order as they are listed in MPI-3.1 chapter 17, just to make
verifying/checking the code easier.

Note that --enable-mpi1-compatibility can still be used to restore
prototypes for the items that have been removed from the MPI
specification (e.g., MPI_Address()).

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit b03a39d359b019d2d7803d194fd03b2fcdffddce)
2018-11-02 14:07:26 -05:00
Howard Pritchard
bbfde1533b btl/openib: fix a problem with ib query
Under certain circumstances, ibv_exp_query_device was
returning an error due to uninitialized fields in the
extended attributes struct.

Fixes: #5810
Fixes: #5914

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 8126779a354b3e0c720d3e1790f7b936dd5b93b2)
2018-11-02 10:42:17 -06:00
Sergey Oblomov
b416c8afe2 OSHMEM/AMO: code beautify
- added <cr> to split API groups to simplify human processing

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 6e7810208966d73e0a56f74b536aa5c56b9a8d1c)
2018-11-01 16:52:01 +02:00
Sergey Oblomov
d3f08d010c OSHMEM/AMO: added missing C11 macro datatypes
- added signed datatypes for atomic_add calls
- added unsigned datatypes for atomic put/inc/get/fetch calls
- fixed incorrect SHMEM_CTX_DEFAULT macro, added
  external declaration of oshmem_ctx_default variable

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit f63d6da6d733e263385188b5470df195f092d041)
2018-11-01 16:51:56 +02:00
Sergey Oblomov
aaf15a6c17 OSHMEM/PROFILE: fixed profile build
- added missing file to profile makefile
- constants SHMEM_CTX_* are shifted into public header

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 4a3e83780c0303e7e4d0ff92d7ba85d3a2239737)
2018-11-01 16:37:49 +02:00
Howard Pritchard
e851879081
Merge pull request #5994 from rhc54/cmr40/cleanup
Remove stale defunct tools
2018-10-31 13:29:57 -06:00
Aravind Gopalakrishnan
5a74ddb34d coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms
PR #5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.

MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 88d781056f43934a93e16db556b340e72cdd3742)
2018-10-31 11:37:29 -07:00
Yossi Itigin
8a329a797c SCOLL/BASIC: Fix invalid pSync pointer passed to barrier func
mca_scoll_basic_alltoall() passed (pSync + 1) to barrier function, but
the value of _SHMEM_ALLTOALL_SYNC_SIZE is 1, which made the barrier
function use an invalid memory location. In particular, this location
was not initialized to _SHMEM_SYNC_VALUE, which broke the barrier
algorithm and it did not complete: One PE could read 0 from its peer and
assume the peer already started the barrier, and then write 1 to the
peer. Then, the peer entered the barrier and overwrote the 1 with 0, and
then it waited forever to see '1' in its pSync.

Found with shmem_verifier test suite.

(picked from master 6754bf1)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-31 12:22:19 +02:00
Geoff Paulsen
cd1e927e1b
Merge pull request #5995 from gpaulsen/topic/v4.0.x/fix_pgi
COMMON/UCX: added error code to log output
2018-10-30 15:11:37 -05:00
Ralph Castain
ba6ad9fe42 Remove stale defunct tools
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 05ac8fa71c0833eeeaa878b72a31503d361e145e)
2018-10-30 08:51:25 -07:00
Sergey Oblomov
0846c9d112 COMMON/UCX: added error code to log output
Also fixes a PGI compilation error with --enable-debug.

Signed-off-by: Geoff Paulsen <gpaulsen@users.noreply.github.com>
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 1099d5f02327329e0c58d9403e3e0a7f1e1d1920)
2018-10-30 09:55:25 -05:00
Ralph Castain
712ddd326f Remove the stale orte-dvm code
Users should migrate to https://github.com/pmix/prrte

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 1bd772e8ebf66f705537b9a6e1af2b6093ef8471)
2018-10-30 07:54:35 -07:00
Geoff Paulsen
660743cd3a
Merge pull request #5988 from hppritcha/topic/libevent_configure_4.0.x
Topic/libevent configure 4.0.x
2018-10-30 07:49:46 -05:00
Geoff Paulsen
21d2597afd
Merge pull request #5974 from rhc54/cmr4/mpir
v4.0: Provide deprecation warning of MPIR debugger
2018-10-29 16:29:15 -05:00
Gilles Gouaillardet
7f36dce348 event/external: fix version requirement
Only default to the external component if its version is
greater or equal than the internal libevent (2.0.22)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit b2050392051ed3d9d842f105326f7ea2223aafc4)
2018-10-29 15:20:48 -06:00
Gilles Gouaillardet
40a1f0c214 event/external: misc configury fixes
- Always use the external component when configure'd with --with-libevent=external
 - Fix the external libevent library version detection
   by testing _EVENT_NUMERIC_VERSION and EVENT__NUMERIC_VERSION macros
 - Use the event2/event.h header (event.h is deprecated since libevent 2.0

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 35e77a286c369c1be66ee7ef9ad5ec2faef47edb)
2018-10-29 15:20:32 -06:00
Geoff Paulsen
d0968f516a
Merge pull request #5963 from amaslenn/mlnx-no-uct-v4
platform/mellanox: disable btl-uct by default — v4
2018-10-26 13:48:15 -05:00
Howard Pritchard
32e8592c5c
Merge pull request #5966 from hjelmn/v4.x_btl_uct_adjust_for_changes_in_the_openucx_uct_interface_that_break_things
btl/uct: update for UCT_CB_FLAG_SYNC removal
2018-10-26 12:39:35 -06:00
Ralph Castain
8aae7943ab Provide deprecation warning of MPIR debugger
If we detect that we are being debugged by an MPIR-based debugger, then
print a warning that OMPI's MPIR support has been deprecated and will be
removed in a subsequent release.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 2cb271716beae08b5e5e20f30a5a2fe3e5c50c5e)
2018-10-25 08:49:00 -07:00
Nathan Hjelm
270ae73611 btl/uct: update for UCT_CB_FLAG_SYNC removal
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
(cherry picked from commit 1b37328ba8e8cbd30a5dcdae8332dc879aea48d7)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-23 10:18:43 -06:00
Andrey Maslennikov
79abcc2e93 platform/mellanox: disable btl-uct by default
(cherry picked from commit 074e9cc92c79e6b9f8c492b2613eac3f27e1c522)
Signed-off-by: Andrey Maslennikov <andreyma@mellanox.com>
2018-10-23 10:33:35 +03:00
Howard Pritchard
e8f28a5506
Merge pull request #5882 from hjelmn/btl_uct_updates_for_the_v4_branch
btl/uct: bug fixes and general improvements
2018-10-22 13:12:34 -06:00
Howard Pritchard
f9d2f3b912
Merge pull request #5941 from hppritcha/topic/remove_bfo_pml_v4.0.x
v4.0.x: remove the bfo pml
2018-10-22 09:50:05 -06:00
Howard Pritchard
837f7eb1dd
Merge pull request #5942 from hppritcha/topic/new_for_4.0.0rc5
NEWS: updates for v4.0.0rc5
2018-10-22 09:49:35 -06:00
Howard Pritchard
6c18cb179d
Merge pull request #5945 from hppritcha/topic/remove_crs_for_v4.0.0
v4.0.x: remove some dead crs components
2018-10-18 06:55:44 -06:00
Howard Pritchard
210b4c60aa remove some dead crs components
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 6564d3d217c3ebff24d0e1fd72929756dc498dfe)
2018-10-17 16:16:36 -06:00
Howard Pritchard
48e12cf766 NEWS: updates for v4.0.0rc5
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-10-17 14:36:27 -06:00
Howard Pritchard
a806d09450 remove the bfo pml
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 7d6774acf89558c05f415c96c00502429e26e502)
2018-10-17 14:00:11 -06:00
Geoff Paulsen
b8e040c704
Merge pull request #5904 from gpaulsen/topic/v4.0.0rc5
Reving to v4.0.0rc5
2018-10-17 12:59:13 -07:00
Edgar Gabriel
278ecf2205 io/ompio: add verification for data representations.
check for providing a data representation that is actually supported
by ompio.

Add also one check for a non-NULL pointer in mpi/c/file_set_view
for the data representation.

Also fixes parts of issue #5643

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:48 -05:00
Edgar Gabriel
a07c9e96b1 io/ompio: execute barrier before sync
this ensures that all processes are done modifying a file
before syncing. Fixes an error in the testmpio testsuite.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:35 -05:00
Edgar Gabriel
96c1a5b9dc common/ompio: check datatypes when setting file view
return MPI_ERR_ARG if the size of the fileview is not a
multiple of the size of the etype provided.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:19 -05:00
Edgar Gabriel
425a71799e common/ompio: return correct error code for improper access
return MPI_ERR_ACCESS if the user tries to read from  a file
that was opened using MPI_MODE_WRONLY

return MPI_ERR_READ_ONLY if the user tries to write a file
that was opened using MPI_MODE_RDONLY

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:04 -05:00
Edgar Gabriel
c65dda6f5f io/ompio: fix seek position calculation for SEEK_CUR
This commit fixes the calculation of the position where to
seek to, in case SEEK_CUR is used.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:21:47 -05:00
Howard Pritchard
e74443a8b8
Merge pull request #5929 from hjelmn/v4.0.x_opal_free_list_really_old_race_we_should_really_fix_now
opal/free_list: fix race condition
2018-10-17 10:08:11 -06:00
Nathan Hjelm
e6f84e79de btl/uct: fix deadlock in connection code
This commit fixes a deadlock that can occur when using a TL that
supports the connect to endpoint model. The deadlock was occurring
while processing an incoming connection requests. This was done from
an active-message callback. For some unknown reason (at this time)
this callback was sometimes hanging. To avoid the issue the connection
active-message is saved for later processing.

At the same time I cleaned up the connection code to eliminate
duplicate messages when possible.

This commit also fixes some bugs in the active-message send path:

 - Correctly set all fragment fields in prepare_src.

 - Fix bug when using buffered-send. We were not reading the return
   code correctly (which is in bytes). This resulted in a message
   getting sent multiple times.

 - Don't try to progress sends from the btl_send function when in an
   active-message callback. It could lead to deep recursion and an
   eventual crash if we get a trace like
   send->progress->am_complete->ob1_callback->send->am_complete...

Closes #5820
Closes #5821

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 707d35deeb62a93ea8a3806d07e07e3a96c51d19)
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2018-10-16 19:16:11 -06:00
Howard Pritchard
cd7d70156c
Merge pull request #5899 from jsquyres/pr/v4.0.x/fix-c99-comments-in-mpih
v4.0.x: mpi.h.in: remove C99-style comments
2018-10-16 16:33:08 -06:00
Howard Pritchard
d2fb9949a5
Merge pull request #5785 from jsquyres/pr/v4.0.x/more-compiler-warnings-fixes
v4.0.x: Squash a bunch of harmless compiler warnings.
2018-10-16 16:30:21 -06:00
Howard Pritchard
2752d43f65
Merge pull request #5875 from kawashima-fj/pr/v4.0.x/javadoc-tag
v4.0.x: java: Fix javadoc build failure with OpenJDK 11
2018-10-16 16:29:34 -06:00
Howard Pritchard
7ceb508b93
Merge pull request #5889 from yosefe/topic/pml-ucx-fix-datatype-leak-v4.0.x
pml_ucx: add ompi datatype attribute to release ucp_datatype - v4.0.x
2018-10-16 16:29:01 -06:00
Nathan Hjelm
eaa98af52c opal/free_list: fix race condition
There was a race condition in opal_free_list_get. Code throughout the
Open MPI codebase was assuming that a NULL return from this function
was due to an out-of-memory condition. In some cases this can lead to
a fatal condition (MPI_Irecv and MPI_Isend in pml/ob1 for
example). Before this commit opal_free_list_get_mt looked like this:

```c
static inline opal_free_list_item_t *opal_free_list_get_mt (opal_free_list_t *flist)
{
    opal_free_list_item_t *item =
        (opal_free_list_item_t*) opal_lifo_pop_atomic (&flist->super);

    if (OPAL_UNLIKELY(NULL == item)) {
        opal_mutex_lock (&flist->fl_lock);
        opal_free_list_grow_st (flist, flist->fl_num_per_alloc);
        opal_mutex_unlock (&flist->fl_lock);
        item = (opal_free_list_item_t *) opal_lifo_pop_atomic (&flist->super);
    }

    return item;
}
```

The problem is in a multithreaded environment is *is* possible for the
free list to be grown successfully but the thread calling
opal_free_list_get_mt to be left without an item. The happens if
between the calls to opal_lifo_push_atomic in opal_free_list_grow_st
and the call to opal_lifo_pop_atomic other threads pop all the items
added to the free list.

This commit fixes the issue by ensuring the thread that successfully
grew the free list **always** gets a free list item.

Fixes #2921

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 5c770a7becc496f63b9f9a59151206236416f4f4)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-16 15:28:20 -06:00
Ralph Castain
05e0545581 Ensure SIGCHLD is unblocked
Thanks to @hjelmn for debugging it and providing the patch

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit efa8bcc17078c89f1c9d6aabed35c90973a469bf)
(cherry picked from commit 647a760b7e24194b37571a8245d8d39ed202e75b)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-16 15:21:18 -06:00
Howard Pritchard
e2cf1e3ec5
Merge pull request #5887 from yosefe/topic/osc-ucx-fix-finalize-hang-v4.0.x
osc_ucx: fix hang/timeout in component finalize - v4.0
2018-10-16 09:21:50 -06:00
Howard Pritchard
753087ab17
Merge pull request #5888 from hoopoepg/topic/fixed-zero-size-window-v4.0
OSC/UCX: fixed zero-size window processing - v4.0.x
2018-10-16 09:21:08 -06:00