1
1
Граф коммитов

29310 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
949dbb4b25
Merge pull request #6017 from hppritcha/topic/swat_issue5810_4.0.x
btl/openib: fix a problem with ib query
2018-11-02 13:11:58 -06:00
Geoffrey Paulsen
2d3b4bb91a mpi.h: restore some MPI-deprecated items to default builds
Commit 89da9651b inadvertantly #if'ed out both deprecated *and*
removed items from mpi.h.  The intent was only to #if out items that
have been *removed* from the MPI specification and leave all items
that are merely deprecated.

This commit also re-orders the deleted typedef+functions to be in the
same order as they are listed in MPI-3.1 chapter 17, just to make
verifying/checking the code easier.

Note that --enable-mpi1-compatibility can still be used to restore
prototypes for the items that have been removed from the MPI
specification (e.g., MPI_Address()).

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit b03a39d359)
2018-11-02 14:07:26 -05:00
Howard Pritchard
bbfde1533b btl/openib: fix a problem with ib query
Under certain circumstances, ibv_exp_query_device was
returning an error due to uninitialized fields in the
extended attributes struct.

Fixes: #5810
Fixes: #5914

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 8126779a35)
2018-11-02 10:42:17 -06:00
Sergey Oblomov
b416c8afe2 OSHMEM/AMO: code beautify
- added <cr> to split API groups to simplify human processing

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 6e78102089)
2018-11-01 16:52:01 +02:00
Sergey Oblomov
d3f08d010c OSHMEM/AMO: added missing C11 macro datatypes
- added signed datatypes for atomic_add calls
- added unsigned datatypes for atomic put/inc/get/fetch calls
- fixed incorrect SHMEM_CTX_DEFAULT macro, added
  external declaration of oshmem_ctx_default variable

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit f63d6da6d7)
2018-11-01 16:51:56 +02:00
Sergey Oblomov
aaf15a6c17 OSHMEM/PROFILE: fixed profile build
- added missing file to profile makefile
- constants SHMEM_CTX_* are shifted into public header

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 4a3e83780c)
2018-11-01 16:37:49 +02:00
Howard Pritchard
e851879081
Merge pull request #5994 from rhc54/cmr40/cleanup
Remove stale defunct tools
2018-10-31 13:29:57 -06:00
Aravind Gopalakrishnan
5a74ddb34d coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms
PR #5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.

MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 88d781056f)
2018-10-31 11:37:29 -07:00
Yossi Itigin
8a329a797c SCOLL/BASIC: Fix invalid pSync pointer passed to barrier func
mca_scoll_basic_alltoall() passed (pSync + 1) to barrier function, but
the value of _SHMEM_ALLTOALL_SYNC_SIZE is 1, which made the barrier
function use an invalid memory location. In particular, this location
was not initialized to _SHMEM_SYNC_VALUE, which broke the barrier
algorithm and it did not complete: One PE could read 0 from its peer and
assume the peer already started the barrier, and then write 1 to the
peer. Then, the peer entered the barrier and overwrote the 1 with 0, and
then it waited forever to see '1' in its pSync.

Found with shmem_verifier test suite.

(picked from master 6754bf1)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-31 12:22:19 +02:00
Geoff Paulsen
cd1e927e1b
Merge pull request #5995 from gpaulsen/topic/v4.0.x/fix_pgi
COMMON/UCX: added error code to log output
2018-10-30 15:11:37 -05:00
Ralph Castain
ba6ad9fe42 Remove stale defunct tools
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 05ac8fa71c)
2018-10-30 08:51:25 -07:00
Sergey Oblomov
0846c9d112 COMMON/UCX: added error code to log output
Also fixes a PGI compilation error with --enable-debug.

Signed-off-by: Geoff Paulsen <gpaulsen@users.noreply.github.com>
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 1099d5f023)
2018-10-30 09:55:25 -05:00
Ralph Castain
712ddd326f Remove the stale orte-dvm code
Users should migrate to https://github.com/pmix/prrte

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 1bd772e8eb)
2018-10-30 07:54:35 -07:00
Geoff Paulsen
660743cd3a
Merge pull request #5988 from hppritcha/topic/libevent_configure_4.0.x
Topic/libevent configure 4.0.x
2018-10-30 07:49:46 -05:00
Geoff Paulsen
21d2597afd
Merge pull request #5974 from rhc54/cmr4/mpir
v4.0: Provide deprecation warning of MPIR debugger
2018-10-29 16:29:15 -05:00
Gilles Gouaillardet
7f36dce348 event/external: fix version requirement
Only default to the external component if its version is
greater or equal than the internal libevent (2.0.22)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit b205039205)
2018-10-29 15:20:48 -06:00
Gilles Gouaillardet
40a1f0c214 event/external: misc configury fixes
- Always use the external component when configure'd with --with-libevent=external
 - Fix the external libevent library version detection
   by testing _EVENT_NUMERIC_VERSION and EVENT__NUMERIC_VERSION macros
 - Use the event2/event.h header (event.h is deprecated since libevent 2.0

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 35e77a286c)
2018-10-29 15:20:32 -06:00
Geoff Paulsen
d0968f516a
Merge pull request #5963 from amaslenn/mlnx-no-uct-v4
platform/mellanox: disable btl-uct by default — v4
2018-10-26 13:48:15 -05:00
Howard Pritchard
32e8592c5c
Merge pull request #5966 from hjelmn/v4.x_btl_uct_adjust_for_changes_in_the_openucx_uct_interface_that_break_things
btl/uct: update for UCT_CB_FLAG_SYNC removal
2018-10-26 12:39:35 -06:00
Ralph Castain
8aae7943ab Provide deprecation warning of MPIR debugger
If we detect that we are being debugged by an MPIR-based debugger, then
print a warning that OMPI's MPIR support has been deprecated and will be
removed in a subsequent release.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 2cb271716b)
2018-10-25 08:49:00 -07:00
Nathan Hjelm
270ae73611 btl/uct: update for UCT_CB_FLAG_SYNC removal
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
(cherry picked from commit 1b37328ba8)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-23 10:18:43 -06:00
Andrey Maslennikov
79abcc2e93 platform/mellanox: disable btl-uct by default
(cherry picked from commit 074e9cc92c)
Signed-off-by: Andrey Maslennikov <andreyma@mellanox.com>
2018-10-23 10:33:35 +03:00
Howard Pritchard
e8f28a5506
Merge pull request #5882 from hjelmn/btl_uct_updates_for_the_v4_branch
btl/uct: bug fixes and general improvements
2018-10-22 13:12:34 -06:00
Howard Pritchard
f9d2f3b912
Merge pull request #5941 from hppritcha/topic/remove_bfo_pml_v4.0.x
v4.0.x: remove the bfo pml
2018-10-22 09:50:05 -06:00
Howard Pritchard
837f7eb1dd
Merge pull request #5942 from hppritcha/topic/new_for_4.0.0rc5
NEWS: updates for v4.0.0rc5
2018-10-22 09:49:35 -06:00
Howard Pritchard
6c18cb179d
Merge pull request #5945 from hppritcha/topic/remove_crs_for_v4.0.0
v4.0.x: remove some dead crs components
2018-10-18 06:55:44 -06:00
Howard Pritchard
210b4c60aa remove some dead crs components
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 6564d3d217)
2018-10-17 16:16:36 -06:00
Howard Pritchard
48e12cf766 NEWS: updates for v4.0.0rc5
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-10-17 14:36:27 -06:00
Howard Pritchard
a806d09450 remove the bfo pml
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 7d6774acf8)
2018-10-17 14:00:11 -06:00
Geoff Paulsen
b8e040c704
Merge pull request #5904 from gpaulsen/topic/v4.0.0rc5
Reving to v4.0.0rc5
2018-10-17 12:59:13 -07:00
Edgar Gabriel
278ecf2205 io/ompio: add verification for data representations.
check for providing a data representation that is actually supported
by ompio.

Add also one check for a non-NULL pointer in mpi/c/file_set_view
for the data representation.

Also fixes parts of issue #5643

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:48 -05:00
Edgar Gabriel
a07c9e96b1 io/ompio: execute barrier before sync
this ensures that all processes are done modifying a file
before syncing. Fixes an error in the testmpio testsuite.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:35 -05:00
Edgar Gabriel
96c1a5b9dc common/ompio: check datatypes when setting file view
return MPI_ERR_ARG if the size of the fileview is not a
multiple of the size of the etype provided.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:19 -05:00
Edgar Gabriel
425a71799e common/ompio: return correct error code for improper access
return MPI_ERR_ACCESS if the user tries to read from  a file
that was opened using MPI_MODE_WRONLY

return MPI_ERR_READ_ONLY if the user tries to write a file
that was opened using MPI_MODE_RDONLY

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:04 -05:00
Edgar Gabriel
c65dda6f5f io/ompio: fix seek position calculation for SEEK_CUR
This commit fixes the calculation of the position where to
seek to, in case SEEK_CUR is used.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:21:47 -05:00
Howard Pritchard
e74443a8b8
Merge pull request #5929 from hjelmn/v4.0.x_opal_free_list_really_old_race_we_should_really_fix_now
opal/free_list: fix race condition
2018-10-17 10:08:11 -06:00
Nathan Hjelm
e6f84e79de btl/uct: fix deadlock in connection code
This commit fixes a deadlock that can occur when using a TL that
supports the connect to endpoint model. The deadlock was occurring
while processing an incoming connection requests. This was done from
an active-message callback. For some unknown reason (at this time)
this callback was sometimes hanging. To avoid the issue the connection
active-message is saved for later processing.

At the same time I cleaned up the connection code to eliminate
duplicate messages when possible.

This commit also fixes some bugs in the active-message send path:

 - Correctly set all fragment fields in prepare_src.

 - Fix bug when using buffered-send. We were not reading the return
   code correctly (which is in bytes). This resulted in a message
   getting sent multiple times.

 - Don't try to progress sends from the btl_send function when in an
   active-message callback. It could lead to deep recursion and an
   eventual crash if we get a trace like
   send->progress->am_complete->ob1_callback->send->am_complete...

Closes #5820
Closes #5821

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 707d35deeb)
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2018-10-16 19:16:11 -06:00
Howard Pritchard
cd7d70156c
Merge pull request #5899 from jsquyres/pr/v4.0.x/fix-c99-comments-in-mpih
v4.0.x: mpi.h.in: remove C99-style comments
2018-10-16 16:33:08 -06:00
Howard Pritchard
d2fb9949a5
Merge pull request #5785 from jsquyres/pr/v4.0.x/more-compiler-warnings-fixes
v4.0.x: Squash a bunch of harmless compiler warnings.
2018-10-16 16:30:21 -06:00
Howard Pritchard
2752d43f65
Merge pull request #5875 from kawashima-fj/pr/v4.0.x/javadoc-tag
v4.0.x: java: Fix javadoc build failure with OpenJDK 11
2018-10-16 16:29:34 -06:00
Howard Pritchard
7ceb508b93
Merge pull request #5889 from yosefe/topic/pml-ucx-fix-datatype-leak-v4.0.x
pml_ucx: add ompi datatype attribute to release ucp_datatype - v4.0.x
2018-10-16 16:29:01 -06:00
Nathan Hjelm
eaa98af52c opal/free_list: fix race condition
There was a race condition in opal_free_list_get. Code throughout the
Open MPI codebase was assuming that a NULL return from this function
was due to an out-of-memory condition. In some cases this can lead to
a fatal condition (MPI_Irecv and MPI_Isend in pml/ob1 for
example). Before this commit opal_free_list_get_mt looked like this:

```c
static inline opal_free_list_item_t *opal_free_list_get_mt (opal_free_list_t *flist)
{
    opal_free_list_item_t *item =
        (opal_free_list_item_t*) opal_lifo_pop_atomic (&flist->super);

    if (OPAL_UNLIKELY(NULL == item)) {
        opal_mutex_lock (&flist->fl_lock);
        opal_free_list_grow_st (flist, flist->fl_num_per_alloc);
        opal_mutex_unlock (&flist->fl_lock);
        item = (opal_free_list_item_t *) opal_lifo_pop_atomic (&flist->super);
    }

    return item;
}
```

The problem is in a multithreaded environment is *is* possible for the
free list to be grown successfully but the thread calling
opal_free_list_get_mt to be left without an item. The happens if
between the calls to opal_lifo_push_atomic in opal_free_list_grow_st
and the call to opal_lifo_pop_atomic other threads pop all the items
added to the free list.

This commit fixes the issue by ensuring the thread that successfully
grew the free list **always** gets a free list item.

Fixes #2921

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 5c770a7bec)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-16 15:28:20 -06:00
Ralph Castain
05e0545581 Ensure SIGCHLD is unblocked
Thanks to @hjelmn for debugging it and providing the patch

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit efa8bcc17078c89f1c9d6aabed35c90973a469bf)
(cherry picked from commit 647a760b7e)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-16 15:21:18 -06:00
Howard Pritchard
e2cf1e3ec5
Merge pull request #5887 from yosefe/topic/osc-ucx-fix-finalize-hang-v4.0.x
osc_ucx: fix hang/timeout in component finalize - v4.0
2018-10-16 09:21:50 -06:00
Howard Pritchard
753087ab17
Merge pull request #5888 from hoopoepg/topic/fixed-zero-size-window-v4.0
OSC/UCX: fixed zero-size window processing - v4.0.x
2018-10-16 09:21:08 -06:00
Howard Pritchard
e20284ac4d
Merge pull request #5908 from ggouaillardet/topic/v4.0.x/mpi_sizeof_misc_additions
fortran: add CHARACTER and LOGICAL support to MPI_Sizeof()
2018-10-16 09:16:00 -06:00
Gilles Gouaillardet
2b5a7ca816 fortran: add CHARACTER and LOGICAL support to MPI_Sizeof()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@e4001040b4)
2018-10-12 14:10:45 +09:00
Geoffrey Paulsen
d936752c17 Reving to v4.0.0rc5
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2018-10-11 16:35:08 -05:00
Nathan Hjelm
0c4ba45af2 btl/uct: use the correct tl interface attributes
It is apparently possible for different instances of the same UCT
transport to have different limits (max short put for example). To
account for this we need to store the attributes per TL context not
per TL. This commit fixes the issue.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 6ed68da870)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-11 11:34:33 -06:00
Jeff Squyres
600967d2ed mpi.h.in: remove C99-style comments
While we require C99 to build Open MPI, we do not require C99 to build
user MPI applications.  As such, we shouldn't have C99-style comments
(i.e., "//"-style) in mpi.h.in.

Thanks to @AdamSimpson for reporting the issue.

This commit simply converts a //-style comment to a /**/-style
comment.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f4b3ccabf7)
2018-10-11 11:54:30 -04:00