1
1

29187 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
f9d2f3b912
Merge pull request #5941 from hppritcha/topic/remove_bfo_pml_v4.0.x
v4.0.x: remove the bfo pml
2018-10-22 09:50:05 -06:00
Howard Pritchard
837f7eb1dd
Merge pull request #5942 from hppritcha/topic/new_for_4.0.0rc5
NEWS: updates for v4.0.0rc5
2018-10-22 09:49:35 -06:00
Howard Pritchard
6c18cb179d
Merge pull request #5945 from hppritcha/topic/remove_crs_for_v4.0.0
v4.0.x: remove some dead crs components
2018-10-18 06:55:44 -06:00
Howard Pritchard
210b4c60aa remove some dead crs components
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 6564d3d217c3ebff24d0e1fd72929756dc498dfe)
2018-10-17 16:16:36 -06:00
Howard Pritchard
48e12cf766 NEWS: updates for v4.0.0rc5
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-10-17 14:36:27 -06:00
Howard Pritchard
a806d09450 remove the bfo pml
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 7d6774acf89558c05f415c96c00502429e26e502)
2018-10-17 14:00:11 -06:00
Geoff Paulsen
b8e040c704
Merge pull request #5904 from gpaulsen/topic/v4.0.0rc5
Reving to v4.0.0rc5
2018-10-17 12:59:13 -07:00
Edgar Gabriel
278ecf2205 io/ompio: add verification for data representations.
check for providing a data representation that is actually supported
by ompio.

Add also one check for a non-NULL pointer in mpi/c/file_set_view
for the data representation.

Also fixes parts of issue #5643

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:48 -05:00
Edgar Gabriel
a07c9e96b1 io/ompio: execute barrier before sync
this ensures that all processes are done modifying a file
before syncing. Fixes an error in the testmpio testsuite.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:35 -05:00
Edgar Gabriel
96c1a5b9dc common/ompio: check datatypes when setting file view
return MPI_ERR_ARG if the size of the fileview is not a
multiple of the size of the etype provided.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:19 -05:00
Edgar Gabriel
425a71799e common/ompio: return correct error code for improper access
return MPI_ERR_ACCESS if the user tries to read from  a file
that was opened using MPI_MODE_WRONLY

return MPI_ERR_READ_ONLY if the user tries to write a file
that was opened using MPI_MODE_RDONLY

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:04 -05:00
Edgar Gabriel
c65dda6f5f io/ompio: fix seek position calculation for SEEK_CUR
This commit fixes the calculation of the position where to
seek to, in case SEEK_CUR is used.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:21:47 -05:00
Howard Pritchard
e74443a8b8
Merge pull request #5929 from hjelmn/v4.0.x_opal_free_list_really_old_race_we_should_really_fix_now
opal/free_list: fix race condition
2018-10-17 10:08:11 -06:00
Nathan Hjelm
e6f84e79de btl/uct: fix deadlock in connection code
This commit fixes a deadlock that can occur when using a TL that
supports the connect to endpoint model. The deadlock was occurring
while processing an incoming connection requests. This was done from
an active-message callback. For some unknown reason (at this time)
this callback was sometimes hanging. To avoid the issue the connection
active-message is saved for later processing.

At the same time I cleaned up the connection code to eliminate
duplicate messages when possible.

This commit also fixes some bugs in the active-message send path:

 - Correctly set all fragment fields in prepare_src.

 - Fix bug when using buffered-send. We were not reading the return
   code correctly (which is in bytes). This resulted in a message
   getting sent multiple times.

 - Don't try to progress sends from the btl_send function when in an
   active-message callback. It could lead to deep recursion and an
   eventual crash if we get a trace like
   send->progress->am_complete->ob1_callback->send->am_complete...

Closes #5820
Closes #5821

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 707d35deeb62a93ea8a3806d07e07e3a96c51d19)
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2018-10-16 19:16:11 -06:00
Howard Pritchard
cd7d70156c
Merge pull request #5899 from jsquyres/pr/v4.0.x/fix-c99-comments-in-mpih
v4.0.x: mpi.h.in: remove C99-style comments
2018-10-16 16:33:08 -06:00
Howard Pritchard
d2fb9949a5
Merge pull request #5785 from jsquyres/pr/v4.0.x/more-compiler-warnings-fixes
v4.0.x: Squash a bunch of harmless compiler warnings.
2018-10-16 16:30:21 -06:00
Howard Pritchard
2752d43f65
Merge pull request #5875 from kawashima-fj/pr/v4.0.x/javadoc-tag
v4.0.x: java: Fix javadoc build failure with OpenJDK 11
2018-10-16 16:29:34 -06:00
Howard Pritchard
7ceb508b93
Merge pull request #5889 from yosefe/topic/pml-ucx-fix-datatype-leak-v4.0.x
pml_ucx: add ompi datatype attribute to release ucp_datatype - v4.0.x
2018-10-16 16:29:01 -06:00
Nathan Hjelm
eaa98af52c opal/free_list: fix race condition
There was a race condition in opal_free_list_get. Code throughout the
Open MPI codebase was assuming that a NULL return from this function
was due to an out-of-memory condition. In some cases this can lead to
a fatal condition (MPI_Irecv and MPI_Isend in pml/ob1 for
example). Before this commit opal_free_list_get_mt looked like this:

```c
static inline opal_free_list_item_t *opal_free_list_get_mt (opal_free_list_t *flist)
{
    opal_free_list_item_t *item =
        (opal_free_list_item_t*) opal_lifo_pop_atomic (&flist->super);

    if (OPAL_UNLIKELY(NULL == item)) {
        opal_mutex_lock (&flist->fl_lock);
        opal_free_list_grow_st (flist, flist->fl_num_per_alloc);
        opal_mutex_unlock (&flist->fl_lock);
        item = (opal_free_list_item_t *) opal_lifo_pop_atomic (&flist->super);
    }

    return item;
}
```

The problem is in a multithreaded environment is *is* possible for the
free list to be grown successfully but the thread calling
opal_free_list_get_mt to be left without an item. The happens if
between the calls to opal_lifo_push_atomic in opal_free_list_grow_st
and the call to opal_lifo_pop_atomic other threads pop all the items
added to the free list.

This commit fixes the issue by ensuring the thread that successfully
grew the free list **always** gets a free list item.

Fixes #2921

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 5c770a7becc496f63b9f9a59151206236416f4f4)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-16 15:28:20 -06:00
Ralph Castain
05e0545581 Ensure SIGCHLD is unblocked
Thanks to @hjelmn for debugging it and providing the patch

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit efa8bcc17078c89f1c9d6aabed35c90973a469bf)
(cherry picked from commit 647a760b7e24194b37571a8245d8d39ed202e75b)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-16 15:21:18 -06:00
Howard Pritchard
e2cf1e3ec5
Merge pull request #5887 from yosefe/topic/osc-ucx-fix-finalize-hang-v4.0.x
osc_ucx: fix hang/timeout in component finalize - v4.0
2018-10-16 09:21:50 -06:00
Howard Pritchard
753087ab17
Merge pull request #5888 from hoopoepg/topic/fixed-zero-size-window-v4.0
OSC/UCX: fixed zero-size window processing - v4.0.x
2018-10-16 09:21:08 -06:00
Howard Pritchard
e20284ac4d
Merge pull request #5908 from ggouaillardet/topic/v4.0.x/mpi_sizeof_misc_additions
fortran: add CHARACTER and LOGICAL support to MPI_Sizeof()
2018-10-16 09:16:00 -06:00
Gilles Gouaillardet
2b5a7ca816 fortran: add CHARACTER and LOGICAL support to MPI_Sizeof()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@e4001040b4)
2018-10-12 14:10:45 +09:00
Geoffrey Paulsen
d936752c17 Reving to v4.0.0rc5
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2018-10-11 16:35:08 -05:00
Nathan Hjelm
0c4ba45af2 btl/uct: use the correct tl interface attributes
It is apparently possible for different instances of the same UCT
transport to have different limits (max short put for example). To
account for this we need to store the attributes per TL context not
per TL. This commit fixes the issue.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 6ed68da870c391d88575dc027a3de4826a77f57e)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-11 11:34:33 -06:00
Jeff Squyres
600967d2ed mpi.h.in: remove C99-style comments
While we require C99 to build Open MPI, we do not require C99 to build
user MPI applications.  As such, we shouldn't have C99-style comments
(i.e., "//"-style) in mpi.h.in.

Thanks to @AdamSimpson for reporting the issue.

This commit simply converts a //-style comment to a /**/-style
comment.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f4b3ccabf726eaec6d39cbcff809882da55ae1e5)
2018-10-11 11:54:30 -04:00
Yossi Itigin
eabc94cab0 osc_ucx: add worker flush before osc module free
Make sure all pending communications are done on all ranks before
closing the window. This way it will be safe to close the endpoints when
closing the component.

(picked from master b8e1af6)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 23:02:19 +03:00
Yossi Itigin
4a97d6b9fa pml_ucx: fix return code from mca_pml_ucx_init()
(picked from master 40ac9e4)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:23:49 +03:00
Yossi Itigin
1bffd196ef pml_ucx: add ompi datatype attribute to release ucp_datatype
(picked from master 4763822)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:23:26 +03:00
Geoff Paulsen
c8ff7e3ef2
Merge pull request #5874 from rhc54/cmr40/config
v4.0.0: Fix configury for internal PMIx
2018-10-10 10:47:26 -05:00
Sergey Oblomov
274cbc3c03 OSC/UCX: fixed zero-size window processing
- added processing of zero-size MPI window

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit ae6f81983fe354de812ebe2532120fb20ae24d3b)
2018-10-10 16:49:02 +03:00
Nathan Hjelm
1153082a0f btl/uct: bug fixes and general improvements
This commit updates the uct btl to change the transports parameter
into a priority list. The dc_mlx5, rc_mlx5, and ud transports to the
priority list. This will give better out of the box performance for
multi-threaded codes beacuse the *_mlx5 transports can avoid the mlx5
lock inside libmlx5_rdmav2.

This commit also fixes a number of leaks and a possible deadlock when
using RDMA.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 39be6ec15c202d31423476f09e70199453d25adc)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-09 16:16:50 -06:00
Howard Pritchard
d18ea98263
Merge pull request #5843 from kawashima-fj/pr/v4.0.x/correct-f08-signatures
v4.0.x: fortran/use-mpi-f08: Correct f08 routine signatures
2018-10-09 10:22:07 -05:00
Ralph Castain
376e2e4d98 Add missing file to tarball
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-10-09 06:46:09 -07:00
Ralph Castain
40270bd24b Minor cleanups to the pmix/ext2x component
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-10-09 06:42:08 -07:00
Geoff Paulsen
c2e99c3f40
Merge pull request #5867 from ggouaillardet/topic/v4.0.x/hostfile_double_free
util/hostfile: fix a double free error
2018-10-09 08:00:03 -05:00
Geoff Paulsen
9be650c7b9
Merge pull request #5862 from hjelmn/v4.0.x_vader_fix_for_real_this_time
btl/vader: fix race condition in writing header
2018-10-09 07:59:18 -05:00
KAWASHIMA Takahiro
dd1b3eac1e java: Fix javadoc build failure with OpenJDK 11
OpenJDK 11 changed the default javadoc output HTML version to HTML 5
from HTML 4.01. It causes an error on building Open MPI configured
with `--enable-mpi-java` (default: disable). This fix is compatible
with older OpenJDK.

I don't know whether this problem exists with other vender's JDKs.
But this fix should be compatible with other JDKs because the new
syntax is used in other places in the same file.

Thanks to Siegmar Gross for the bug report.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit b491b454dc304a72c03970326880fbd01641a3d3)
2018-10-09 21:48:10 +09:00
Ralph Castain
226aee42fd Ignore --with-foo=external arguments in subdirs
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 08109acf8cf1e3d5a268da0b73210910fd738cfe)
2018-10-09 03:46:54 -07:00
Jeff Squyres
71b828eb9e opal_config_subdir_args.m4: fix typo
A typo inadvertantly crept in to e836dbd506.  Add the extra '-' to
make it correctly search for --with-*=internal.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 7675956b8fd739b150d3bdd14c265fd728786201)
2018-10-09 03:46:37 -07:00
Ralph Castain
12790e8ec6 Protect PMIx from bad configure entry
Ignore with-hwloc=internal or external as those are meaningless to pmix
(will upstream)

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit c498a7e77a377ddc3a7bcc26ea072627a33cb470)
2018-10-09 03:45:58 -07:00
Ralph Castain
3e2cc6f46a Fail configure if pmix won't build
If we are using the internal PMIx component and the embedded library fails to configure, then fail - don't silently fail to build and then fail in execution

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit f379ba9c8e5ce17641937c351ab46e4b4a82446c)
2018-10-09 03:45:37 -07:00
Ralph Castain
4aa11ec763 Strip --with-foo=internal from opal_subdir_args
Our components that have a --with-foo configure option won't know what
to do with a value of "internal". This scenario only occurs with hwloc
and libevent, both of which are statically contained in libopen-pal

Thanks to @jsquyres for the diff

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit e836dbd506502c797f5bff2f9761510fad4858cd)
2018-10-09 03:41:48 -07:00
Gilles Gouaillardet
2e2366d193 util/hostfile: fix a double free error
As reported at https://stackoverflow.com/questions/52707242/mpirun-segmentation-fault-whenever-i-use-a-hostfile
mpirun crashes when the hostfile contains a "user@host" line.
The root cause is username was not strdup'ed and free'd twice by opal_argv_free() and free()

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@5803385d44)
2018-10-09 13:06:55 +09:00
Geoff Paulsen
212419290e
Merge pull request #5859 from amaslenn/mlnx-no-verbs-v4
platform/mellanox: disable openib/verbs — v4.0.x
2018-10-08 14:13:12 -05:00
Nathan Hjelm
fba5eda436 btl/vader: fix race condition in writing header
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
(cherry picked from commit 8291f6722d890efd15333bf7b26f0d07952fa41e)
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2018-10-08 08:49:01 -06:00
Andrey Maslennikov
7a930039fb platform/mellanox: disable openib/verbs
Signed-off-by: Andrey Maslennikov <andreyma@mellanox.com>
(cherry picked from commit 7180ab144a52136f8c0ec0c63a61b1a31dfed023)
2018-10-08 15:36:56 +03:00
Geoff Paulsen
499ddedd7c
Merge pull request #5844 from kawashima-fj/pr/v4.0.x/pcollreq-f08-signatures
v4.0.x: mpiext/pcollreq: Correct f08 routine signatures
2018-10-05 13:42:35 -05:00
Geoff Paulsen
ab7cf1095d
Merge pull request #5845 from kawashima-fj/pr/v4.0.x/pcollreq-man
v4.0.x: mpiext/pcollreq: Add Fortran bindings in man
2018-10-05 13:40:03 -05:00