1
1

29292 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
6ed68da870 btl/uct: use the correct tl interface attributes
It is apparently possible for different instances of the same UCT
transport to have different limits (max short put for example). To
account for this we need to store the attributes per TL context not
per TL. This commit fixes the issue.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-11 11:33:17 -06:00
Ralph Castain
4c5588f84f
Merge pull request #5900 from rhc54/topic/dh
Fix -H operations for multi-app case
2018-10-11 10:21:38 -07:00
Ralph Castain
f5a6b7f1e9 Fix -H operations for multi-app case
Correctly aggregate slots across -H entries from each app. Take into
account any -H entry when computing nprocs when no value was given.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-10-11 09:30:01 -07:00
Jeff Squyres
7686b8696a
Merge pull request #5897 from jsquyres/pr/fix-c99-comments-in-mpih
mpi.h.in: remove C99-style comments
2018-10-11 11:53:39 -04:00
Yossi Itigin
b71e85b8d5 pml_ucx: fix return code from mca_pml_ucx_init() error flow
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-11 18:48:54 +03:00
Jeff Squyres
f4b3ccabf7 mpi.h.in: remove C99-style comments
While we require C99 to build Open MPI, we do not require C99 to build
user MPI applications.  As such, we shouldn't have C99-style comments
(i.e., "//"-style) in mpi.h.in.

Thanks to @AdamSimpson for reporting the issue.

This commit simply converts a //-style comment to a /**/-style
comment.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-10-11 10:58:06 -04:00
Mikhail Kurnosov
a7386c1e09 coll/libnbc: add recursive doubling algorithm for MPI_Iallgather
Implements recursive doubling algorithm for MPI_Iallgather.
The algorithm can be used only for power-of-two number of processes.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-10-11 21:43:13 +07:00
Jeff Squyres
80348b9f80
Merge pull request #5891 from ftab/pr/readme-update
README: Remove MXM from MTL list
2018-10-11 10:21:54 -04:00
Dennis Field
6e4cd3f229 README: Remove MXM from MTL list
The MXM MTL has been deleted from Open MPI; no need to include it
in the README anymore.

Signed-off-by: Dennis Field <fury@xibase.com>
2018-10-10 20:15:07 -04:00
Yossi Itigin
5f767e1a54
Merge pull request #5890 from yosefe/topic/osc-ucx-flush-worker-instead-of-world-barrier
osc_ucx: add worker flush before osc module free
2018-10-10 23:00:57 +03:00
Yossi Itigin
b8e1af6fcb osc_ucx: add worker flush before osc module free
Make sure all pending communications are done on all ranks before
closing the window. This way it will be safe to close the endpoints when
closing the component.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:47:16 +03:00
Yossi Itigin
bcc48515e4 Revert "osc_ucx: fix hang/timeout in component finalize"
This reverts commit 438d13b4ca1e7333b789ca3fb536fda17b0feb38.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:47:13 +03:00
Yossi Itigin
27d8c8e83c
Merge pull request #5878 from yosefe/topic/pml-ucx-fix-datatype-leak
pml_ucx: add ompi datatype attribute to release ucp_datatype
2018-10-10 20:18:39 +03:00
Brian Barrett
902b57919c btl tcp: Print number of endpoints on error
While trying to debug #3035, it's not clear whether there is
an issue with the modex data or printing the address list.
Print the number of endpoints on the error, which will help
determine which case is happening to Cisco.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-10 08:29:48 -07:00
Brian Barrett
bd07cc6707 btl tcp: Improve initialization debugging
When creating TCP BTL modules, print more information about the
module's ethernet association, including the first address associated
with the device, as debug output.

Fix a flipped output string for IPv4 and IPv6 addresses in the
modex send code.

Add the addresses being published in the modex to the debugging
output in modex send, to help match failures in endpoint match.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-10 08:29:48 -07:00
Yossi Itigin
a012ee91d8
Merge pull request #5886 from yosefe/topic/osc-ucx-fix-finalize-hang
osc_ucx: fix hang/timeout in component finalize
2018-10-10 16:29:29 +03:00
Yossi Itigin
9a365555b0
Merge pull request #5879 from hoopoepg/topic/fixed-zero-size-window
OSC/UCX: fixed zero-size window processing
2018-10-10 16:28:55 +03:00
Yossi Itigin
40ac9e4771 pml_ucx: fix return code from mca_pml_ucx_init()
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 14:41:05 +03:00
Yossi Itigin
dc6809495d osc_ucx: fix hang/timeout in component finalize
Add barrier to make sure all endpoints are destroyed before destroying
the worker.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 14:38:06 +03:00
Sergey Oblomov
ae6f81983f OSC/UCX: fixed zero-size window processing
- added processing of zero-size MPI window

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-10-10 13:08:01 +03:00
Gilles Gouaillardet
98ad78e0bc
Merge pull request #5883 from ggouaillardet/topic/pmix_ext
pmix/ext: misc fixes
2018-10-10 14:26:38 +09:00
Gilles Gouaillardet
ca9009e3d0 pmix/ext3x: fix minor typos
refer PMIx 3x instead of previous 2x

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-10-10 13:18:36 +09:00
Gilles Gouaillardet
b3bd785ce0 pmix/ext3x: add missing header files into the dist tarball
Refs. open-mpi/ompi#5871

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-10-10 13:17:54 +09:00
Gilles Gouaillardet
67402f5f19 pmix/ext2x: add missing header files into the dist tarball
Refs. open-mpi/ompi#5871

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-10-10 13:17:42 +09:00
Nathan Hjelm
32682aa2c0
Merge pull request #5772 from mkurnosov/coll-ibcast-knomial
coll/libnbc: add knomial tree algorithm for MPI_Ibcast
2018-10-09 16:26:13 -06:00
Nathan Hjelm
121b4928c4
Merge pull request #5837 from hjelmn/uct_update
btl/uct: bug fixes and general improvements
2018-10-09 16:14:25 -06:00
Nathan Hjelm
39be6ec15c btl/uct: bug fixes and general improvements
This commit updates the uct btl to change the transports parameter
into a priority list. The dc_mlx5, rc_mlx5, and ud transports to the
priority list. This will give better out of the box performance for
multi-threaded codes beacuse the *_mlx5 transports can avoid the mlx5
lock inside libmlx5_rdmav2.

This commit also fixes a number of leaks and a possible deadlock when
using RDMA.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-09 15:15:45 -06:00
Jeff Squyres
bb13941b69
Merge pull request #5811 from ggouaillardet/topic/mpi_f08_c_types
fortran/use-mpi-f08: add MPI C types
2018-10-09 13:17:30 -04:00
Yossi Itigin
4763822a64 pml_ucx: add ompi datatype attribute to release ucp_datatype
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-09 17:34:34 +03:00
Mikhail Kurnosov
b0429d25df coll/libnbc: add knomial tree algorithm for MPI_Ibcast
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-10-09 20:43:04 +07:00
Mikhail Kurnosov
7bd63e79c8 coll/libnbc: add Rabenseifner's algorithm for MPI_Ireduce
An implementation of R. Rabenseifner's algorithm for MPI_Ireduce.
This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by a gather.

Limitations:
-- count >= 2^{\floor{\log_2 p}}
-- commutative operations only
-- intra-communicators only

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-10-09 20:27:09 +07:00
KAWASHIMA Takahiro
b0e6d1fefc
Merge pull request #5870 from kawashima-fj/pr/javadoc-tag
java: Fix javadoc build failure with OpenJDK 11
2018-10-09 19:11:48 +09:00
KAWASHIMA Takahiro
b491b454dc java: Fix javadoc build failure with OpenJDK 11
OpenJDK 11 changed the default javadoc output HTML version to HTML 5
from HTML 4.01. It causes an error on building Open MPI configured
with `--enable-mpi-java` (default: disable). This fix is compatible
with older OpenJDK.

I don't know whether this problem exists with other vender's JDKs.
But this fix should be compatible with other JDKs because the new
syntax is used in other places in the same file.

Thanks to Siegmar Gross for the bug report.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-10-09 17:49:30 +09:00
Gilles Gouaillardet
0bce306194
Merge pull request #5863 from ggouaillardet/topic/misc_fixes
fix double free & missing header
2018-10-09 13:02:33 +09:00
Gilles Gouaillardet
5803385d44 util/hostfile: fix a double free error
As reported at https://stackoverflow.com/questions/52707242/mpirun-segmentation-fault-whenever-i-use-a-hostfile
mpirun crashes when the hostfile contains a "user@host" line.
The root cause is username was not strdup'ed and free'd twice by opal_argv_free() and free()

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-10-09 11:09:17 +09:00
Gilles Gouaillardet
1ef45b7f1d plm/tm: add a missing include file
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-10-09 11:08:36 +09:00
Brian Barrett
e9e4d2a4bc Handle asprintf errors with opal_asprintf wrapper
The Open MPI code base assumed that asprintf always behaved like
the FreeBSD variant, where ptr is set to NULL on error.  However,
the C standard (and Linux) only guarantee that the return code will
be -1 on error and leave ptr undefined.  Rather than fix all the
usage in the code, we use opal_asprintf() wrapper instead, which
guarantees the BSD-like behavior of ptr always being set to NULL.
In addition to being correct, this will fix many, many warnings
in the Open MPI code base.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-08 16:43:53 -07:00
Brian Barrett
c087365ead opal/util: Always support BSD behavior of asprintf
Open MPI's developers like to assume that asprintf() always sets
the ptr to NULL on error, but the standard (and Linux glibc) do
not guarantee this.  As a result, we're making opal_asprintf()
always available for developers, which will guarantee that
ptr is set to NULL on error.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-08 16:43:53 -07:00
Nathan Hjelm
a7964bf1ad
Merge pull request #5852 from hjelmn/vader_fix_for_real_this_time
btl/vader: fix race condition in writing header
2018-10-08 08:44:10 -06:00
Yossi Itigin
6c9a95df3e
Merge pull request #5858 from amaslenn/mlnx-no-verbs
platform/mellanox: disable openib/verbs
2018-10-08 14:08:09 +03:00
Andrey Maslennikov
7180ab144a platform/mellanox: disable openib/verbs
Signed-off-by: Andrey Maslennikov <andreyma@mellanox.com>
2018-10-08 12:13:44 +03:00
Ralph Castain
784290944b
Merge pull request #5857 from rhc54/topic/plat
Add intel/bend platform file
2018-10-07 05:58:12 -07:00
Ralph Castain
952090854a Add intel/bend platform file
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-10-06 20:26:26 -07:00
Ralph Castain
925edf3561
Merge pull request #5855 from rhc54/topic/orte
Replace asprintf with opal_asprintf
2018-10-06 13:34:02 -07:00
Ralph H Castain
fc81d0d519 Replace asprintf with opal_asprintf
Silence the flood of warnings from ORTE

Signed-off-by: Ralph H Castain <rhc@open-mpi.org>
2018-10-06 19:32:37 +00:00
Ralph Castain
80ee5c858d
Merge pull request #5854 from rhc54/topic/map
Fix map-by node for comm_spawn
2018-10-06 09:56:15 -07:00
Ralph H Castain
51acbf738e Fix map-by node for comm_spawn
Do not reorder the available host list as this causes the head node process assignment to differ from those computed on the other nodes

Signed-off-by: Ralph H Castain <rhc@open-mpi.org>
2018-10-06 15:58:45 +00:00
Ralph Castain
67b5057448
Merge pull request #5853 from rhc54/topic/subdir
Ignore --with-foo=external arguments in subdirs
2018-10-06 07:56:58 -07:00
Ralph Castain
08109acf8c Ignore --with-foo=external arguments in subdirs
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-10-06 06:48:13 -07:00
Nathan Hjelm
f11fea07e3
Merge pull request #5718 from mkurnosov/coll-iexscan-recursivedoubling
coll/libnbc: add recursive doubling algorithm for MPI_Iexscan
2018-10-05 22:23:30 -06:00