1
1
Граф коммитов

28553 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
67ba8da76f ompi_mpi_init: fix race condition
There was a race condition in 35438ae9b5: if multiple threads invoked
ompi_mpi_init() simultaneously (which could happen from both MPI and
OSHMEM), the code did not catch this condition -- Bad Things would
happen.

Now use an atomic cmp/set to ensure that only one thread is able to
advance ompi_mpi_init from NOT_INITIALIZED to INIT_STARTED.

Additionally, change the prototype of ompi_mpi_init() so that
oshmem_init() can safely invoke ompi_mpi_init() multiple times (as
long as MPI_FINALIZE has not started) without displaying an error.  If
multiple threads invoke oshmem_init() simultaneously, one of them will
actually do the initialization, and the rest will loop waiting for it
to complete.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-05 18:09:13 -07:00
Nathan Hjelm
64a5baaa28
Merge pull request #5193 from hjelmn/osc_sm_location
Use /dev/shm for shared memory files in osc components
2018-06-05 09:42:14 -06:00
Nathan Hjelm
948e38d260
Merge pull request #5226 from mkurnosov/base-reduce-scatter-butterfly
coll/base: add butterfly algorithm for MPI_Reduce_scatter
2018-06-05 09:38:39 -06:00
Mikhail Kurnosov
3adf96fdb8 coll/base: add butterfly algorithm for MPI_Reduce_scatter
Implements butterfly algorithm for MPI_Reduce_scatter.
The algorithm can be used both by commutative and non-commutative operations, for power-of-two and non-power-of-two number of processes.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-06-05 15:53:13 +07:00
Nathan Hjelm
8192eb9aa6 patch/overwrite: add support for aarch64
This commit adds patcher support of aarch64. The current
implementation uses mov, movk, and br to perform the jump. This uses 5
instructions.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-06-04 16:04:26 -06:00
Arm
555029b323
Merge pull request #5216 from thananon/btl_ofi
btl/ofi: the component now compiles with libfabric < 1.5.
2018-06-04 13:47:00 -07:00
Thananon Patinyasakdikul
c18cf7315f btl/ofi: configury: will not build if OFI ver < 1.5.
This commit fixes problem where users have libfabric version < 1.5
installed and build failed because the new MR modebits does not exist.

Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
2018-06-04 10:35:50 -07:00
Gilles Gouaillardet
fe8d9dc375
Merge pull request #5221 from ggouaillardet/topic/java_mpi1
java: do not use MPI1 deprecated subroutines
2018-06-04 12:59:08 +09:00
Gilles Gouaillardet
05b3546151 java: do not use MPI1 deprecated subroutines
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-04 10:49:52 +09:00
Ralph Castain
82a664a5ff
Merge pull request #5218 from rhc54/topic/cleanup
Fix external hwloc builds
2018-06-03 13:36:10 -07:00
Ralph Castain
7c0ec7e851 Cleanup warnings in binding code
This still leaves two unresolved warnings:

base/rmaps_base_binding.c:577:22: warning: variable ‘clvm’ set but not used [-Wunused-but-set-variable]
     unsigned clvl=0, clvm=0;
                      ^~~~
base/rmaps_base_binding.c:576:27: warning: variable ‘hwm’ set but not used [-Wunused-but-set-variable]
     hwloc_obj_type_t hwb, hwm;
                           ^~~

The problem is that these values are used in the OPAL_HWLOC_MAKE_OBJ_CACHE macro to form a variable name. Thus, the compiler doesn't recognize the values as being "used". I'm not entirely sure how to resolve it cleanly.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-03 11:47:14 -07:00
Ralph Castain
014bb3c8de Fix external hwloc builds
Remove spurious comma in header file definition. Remove unused variables

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-03 11:24:21 -07:00
Ralph Castain
3020b699f3
Merge pull request #5213 from rhc54/topic/rte
Enable the PMIx ompi/rte component
2018-06-03 10:23:40 -07:00
Jeff Squyres
a1737ca3eb
Merge pull request #5092 from jsquyres/pr/finalized-attribute-fix
mpi/finalized: don't hang if called during MPI_FINALIZE
2018-06-03 13:10:41 -04:00
Ralph Castain
55ac526a67 Enable the PMIx ompi/rte component
Get the OMPI rte/pmix component working. This was tested using PRRTE as the RM, configuring OMPI using:

* autogen --no-orte

* with external libevent, external hwloc, and external PMIx master

* configuring PMIx master with the same libevent and hwloc

* execute the application using PRRTE's "prun" launcher, which has the same cmd line as ORTE's mpirun

Note that PMIx master appears to have a bug in the event notification system that caches job termination events. Thus, the first execution runs fine, but subsequent executions cause an "abort" when the OMPI default error handler is invoked upon notification of the prior job's termination. Will work that separately.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 134cca9ac0de092d767999357573a31703f72292)
2018-06-03 07:25:12 -07:00
Ralph Castain
6df6d6a7f3 Ignore new generated file
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 19d5e5ad57a1e724ed02c6cbb0b5cdf4aff2fac7)
2018-06-03 07:24:59 -07:00
Ralph Castain
4ceaed3a76 Merge https://github.com/bgoglin/ompi into topic/rte2 2018-06-03 07:24:12 -07:00
Howard Pritchard
623e36de8a
Merge pull request #5205 from thananon/btl_ofi
new btl/ofi: RDMA only btl using libfabric.
2018-06-02 13:00:50 -06:00
Thananon Patinyasakdikul
6efb8069ec new btl/ofi: RDMA only btl using libfabric.
This commit added new transport layer to be used with osc rdma module.
This BTL provides put, get, atomic and fetch atomic operations. It can
be used with multiple hardware vendors as long as they have their
provider under Libfabric and have the right capabilities.

Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
2018-06-01 15:22:04 -07:00
Jeff Squyres
38ed70de6f ompi_mpi_finalize: remove some dead code
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-01 13:37:20 -07:00
Jeff Squyres
35438ae9b5 mpi/finalized: revamp INITIALIZED/FINALIZED
Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be
invoked during an attribute destruction callback (e.g., during the
destruction of keyvals on MPI_COMM_SELF during the very beginning of
MPI_FINALIZE).  In such cases, MPI_FINALIZED must return "false".

Prior to this commit, we hung in FINALIZED if it were invoked during
a COMM_SELF attribute destruction callback in FINALIZE.  See
https://github.com/open-mpi/ompi/issues/5084.

This commit converts the MPI_INITIALIZED / MPI_FINALIZED
infrastructure to use a single enum (ompi_mpi_state, set atomically)
to represent the state of MPI:

- not initialized
- init started
- init completed
- finalize started
- finalize past COMM_SELF destruction
- finalize completed

The "finalize past COMM_SELF destruction" state is what allows us to
return "false" from MPI_FINALIZED before COMM_SELF has been fully
destroyed / all attribute callbacks have been invoked.

Since this state is checked at nearly every MPI API call (to see if
we're outside of the INIT/FINALIZE epoch), care was taken to use
atomics to *set* the ompi_mpi_state value in ompi_mpi_init() and
ompi_mpi_finalize(), but performance-critical code paths can simply
read the variable without needing to use a slow call to an
opal_atomic_*() function.

Thanks to @AndrewGaspar for reporting the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-01 13:36:29 -07:00
Edgar Gabriel
0d66e02179
Merge pull request #5211 from edgargabriel/pr/dynamic_gen2_large_offset_Fix
fcoll/dynamic_gen2: ensure some variables can hold an intermediate value
2018-06-01 08:39:36 -05:00
Edgar Gabriel
52bd606294 fcoll/dynamic_gen2: make sure that intermediate variables can hold the offset
for very large offsets, ome ariables used in the fcoll/dynamic_gen2
code base were under certain circumstances not large enough to hold
intermediate values. This issue was more detected in the vulcan component
but could happen in the dynamic_gen2 component as well.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-01 06:53:38 -05:00
Nathan Hjelm
7ac3797a98 README: remove note about ARM/power hangs
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-05-31 19:45:19 -06:00
Nathan Hjelm
f8dbf62879 opal/asm: change ll/sc atomics to macros
This commit fixes a hang that occurs with debug builds of Open MPI on
aarch64 and power/powerpc systems. When the ll/sc atomics are inline
functions the compiler emits load/store instructions for the function
arguments with -O0. These extra load/store arguments can cause the ll
reservation to be cancelled causing live-lock.

Note that we did attempt to fix this with always_inline but the extra
instructions are stil emitted by the compiler (gcc). There may be
another fix but this has been tested and is working well.

References #3697. Close when applied to v3.0.x and v3.1.x.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-05-31 19:45:19 -06:00
Nathan Hjelm
50a9508e0f
Merge pull request #5210 from ggouaillardet/topic/mpifh_compat
fortran/mpif-h: fix MPI1 compatibility Makefile
2018-05-31 19:44:14 -06:00
Gilles Gouaillardet
9f7586465d fortran/mpif-h: fix MPI1 compatibility Makefile
appends MPI1 compatible source files instead of redefining all the source files
fix a typo from open-mpi/ompi@89da9651bb

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-01 09:52:22 +09:00
Nathan Hjelm
b323655809 mpi: make C++ bindings compile when MPI-1 compat is disabled
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-05-31 09:44:19 -06:00
Nathan Hjelm
74563d22a0 test/ddt_lib: remove UB/LB tests
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-05-31 09:44:19 -06:00
Nathan Hjelm
89da9651bb ompi: disable functions removed from MPI-3.0 by default
This commit adds a new configure option: --enable-mpi1-compat. Without
this option we will no longer provide APIs, typedefs, and defines that
were removed from the standard in MPI-3.0. This option will exist for
one major release (Open MPI v4.x.x) and then the option and associated
code will be removed in Open MPI v5.x.x. Open MPI has already
internally prepared for this change. Please prepare your codes
accordingly.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-05-31 09:44:19 -06:00
Yossi Itigin
f5631e546d
Merge pull request #5187 from hoopoepg/topic/amo-non-blocking-ucp
MCA/UCX: atomic add/swap are moved to new UCX atomic API
2018-05-31 13:48:05 +03:00
KAWASHIMA Takahiro
04b509b2d2
Merge pull request #5207 from t-kurita/pr/java-doc-descriptions
java: Improve descriptions of `javadoc`
2018-05-31 16:12:21 +09:00
Kurita, Takehiro
11ae771b82 java: Improve descriptions of javadoc
- Improve descriptions
- Fix some typos
- Remove MPI-1 functions and replace them with MPI-2 functions

Signed-off-by: Kurita, Takehiro <fj6370fp@aa.jp.fujitsu.com>
2018-05-31 15:02:35 +09:00
Jeff Squyres
01e26459d2
Merge pull request #5203 from jsquyres/pr/warnings-stomps
Minor compiler warning stomps
2018-05-30 15:57:05 -04:00
Jeff Squyres
fb0473acb5 pmix3x: compiler warning stomp
This fix was already included in pmix upstream (https://github.com/pmix/pmix/commit/fb7af8af2).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-05-30 10:14:37 -07:00
Jeff Squyres
dec247d96e opal/datatype: minor compiler warning stomp
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-05-30 10:08:19 -07:00
Jeff Squyres
25f2d02c61 fcoll/dynamic_gen2: minor compiler warning stomp
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-05-30 10:08:19 -07:00
Jeff Squyres
2dce549df2 ompi/debuggers: stomp a compiler warning in dlopen_test.c
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-05-30 10:08:14 -07:00
Sergey Oblomov
319bb376f9 MCA/UCX: branch optimization in cswap call
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-30 07:14:43 +03:00
Nathan Hjelm
e9de42544e osc/sm: add support for controlling location of backing store
This commit adds a new MCA variable to set the location of the backing
store: osc_sm_backing_directory. The default on Linux has been
changed to use /dev/shm to improve performance in cases where /tmp is
not a tmpfs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-05-29 21:44:01 -06:00
Nathan Hjelm
d0d59b1d7d osc/rdma: add support for controlling location of backing store
This commit adds a new MCA variable to set the location of the backing
store: osc_rdma_backing_directory. The default on Linux has been
changed to use /dev/shm to improve performance in cases where /tmp is
not a tmpfs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-05-29 21:43:33 -06:00
Sergey Oblomov
daad71f036 MCA/UCX: switch/case optimization
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-29 22:33:33 +03:00
Jeff Squyres
2e8ab41ba5
Merge pull request #5200 from jsquyres/pr/disable-osc-pt2pt-for-thread-multiple
osc/pt2pt: disable when THREAD_MULITPLE
2018-05-29 15:33:15 -04:00
Sergey Oblomov
6be4066e23 MCA/UCX: cswap call if updated to non-blocking API
- minor fixes

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-29 20:07:51 +03:00
Sergey Oblomov
b668e19cd1 Merge remote-tracking branch 'wg/master' into topic/amo-non-blocking-ucp
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-29 19:08:48 +03:00
Howard Pritchard
5b7c866f59 osc/pt2pt: disable when THREAD_MULTIPLE.
Per discussion at
https://github.com/open-mpi/ompi/issues/2614#issuecomment-392815654,
do not allow for selection of the OSC PT2PT when creating an MPI RMA
window when THREAD_MULTIPLE is active.  Print a helpful message and
return a not-supported error.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

(cherry picked from commit d0ffd660841623c02d1dfa3151e7f7afd3327698)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-05-29 08:59:53 -07:00
bosilca
4ebed21b6d
Merge pull request #4670 from ggouaillardet/topic/opal_bitmap
opal/bitmap: fix opal_bitmap_set_bit()
2018-05-29 10:50:21 -04:00
Yossi Itigin
976cd5e307
Merge pull request #5186 from hoopoepg/topic/ucx-amo-error-msg
MCA/UCX: fixed error messages for incorrect msg size
2018-05-29 15:54:02 +03:00
Sergey Oblomov
0c3ed93ef0 MCA/UCX: added opal progress call to wait request routine
- added opal_progress call to wait function to avoid
  possible [dead]lock issues
- wait call is declared as inline
- minor fixes

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-29 11:34:40 +03:00
bosilca
9adc38c963
Merge pull request #5197 from mkurnosov/reduce-scatter-block-butterfly
coll: reduce_scatter_block: add butterfly algorithm
2018-05-27 18:12:27 -04:00