1
1
Граф коммитов

10459 Коммитов

Автор SHA1 Сообщение Дата
Artem Polyakov
6678ac0f55 osc/ucx: Fix possible win creation/destruction race condition
To avoid fully initializing the osc/ucx component for MPI application
that are not using One-Sided functionality, the initialization happens
at the first MPI window creation.

This commit ensures atomicity of global state modifications.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-06-20 09:05:03 -07:00
Artem Polyakov
0857742624 osc/ucx: Fix worker pool finalization
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-06-20 09:05:03 -07:00
Nathan Hjelm
560886f095
Merge pull request #6746 from devreal/osc_winalloc_err
OSC rdma win allocate: propagate errors to avoid deadlocks
2019-06-18 17:57:53 -07:00
Harald Klimach
e222a04ae5 Suggestion to fix division by zero in file view.
In common_ompi_aggregators calc_cost routine:
do not cast the real division to an int intermediately.
This patch removes the obsolete int variable c and assigns
the result of the P_a/P_x division directly to n_as.

With the intermediate int c variable, n_as gets 0 if P_a < P_x,
resulting in a division by 0 when computing n_s.

Signed-off-by: Harald Klimach <harald.klimach@uni-siegen.de>
2019-06-13 18:47:32 +02:00
Jeff Squyres
7c3aeb3061
Merge pull request #6686 from alex-anenkov/coll-iallreduce-recursivedoubling
coll/libnbc: add recursive doubling algorithm for MPI_Iallreduce
2019-06-10 10:09:51 -04:00
Yossi Itigin
a46e5da3ca
Merge pull request #6744 from brminich/topic/all2all_linear_sync_fix
COLL/BASE: Fix linear sync all2all
2019-06-09 21:23:38 +03:00
Joseph Schuchart
8f27cc26d9 OSC rdma win allocate: synchronize error codes across shared memory group
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2019-06-07 11:03:21 +02:00
KAWASHIMA Takahiro
85c3311b7d
Merge pull request #6726 from yanagibashi/pr/add-f08-procedure-names
mpiext/pcollreq: Add `_f08` to procedure names
2019-06-07 09:10:58 +09:00
Mikhail Brinskii
79006f4e5a COLL/BASE: Fix linear sync all2all
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-06-06 19:22:42 +03:00
Yossi Itigin
8535dd570b
Merge pull request #6732 from dmitrygladkov/topic/pml/ucx_init
PML/UCX: Don't destroy UCP worker if it wasn't created
2019-06-06 10:41:33 +03:00
KAWASHIMA Takahiro
2b856573b2
Merge pull request #6699 from t-kurita/pr/java-alltoallw-arrays
java: Fix compilation error in allToAllw using Java arrays
2019-06-04 11:33:17 +09:00
Dmitry Gladkov
c864ca51d2 PML/UCX: Don't destroy UCP worker if it wasn't created
Signed-off-by: Dmitry Gladkov <dmitrygla@mellanox.com>
2019-06-03 10:49:36 +03:00
Tsubasa Yanagibashi
3148b0cfaa mpiext/pcollreq: Add _f08 to procedure names
The procedure names don't contain "_f08" of Fortran 2008 bindings of
Persistent Collective Operations(mpiext/pcollreq/use-mpi-f08).
This fix adds "_f08" to the procedure names of pcollreq/use-mpi-f08,
same as other Fortran 2008 routines in `ompi/mpi/fortran/use-mpi-f08/mod`.

Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
2019-05-31 15:22:42 +09:00
George Bosilca
a0fce4eac2
Fix the man pages for some of the MPI_T_* functions.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-29 00:23:35 -04:00
George Bosilca
eed770ce5c
Fix the SPC initialization.
Use the PVAR ctx to save the SPC index, so that no lookup nor
restriction on the SPC vars position is imposed.
Make sure the PVAR are always registered.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-29 00:23:18 -04:00
George Bosilca
7dab8c002b
Fixed SPC/MPI_T initialization error.
Signed-off-by: Yong Qin <yongq@mellanox.com>
2019-05-28 15:10:32 -04:00
Tomislav Janjusic
6ea920e225 Coll/hcoll: adding scatterv interface
Signed-off-by: Valentin Petrov valentinp@mellanox.com
2019-05-27 12:27:43 +03:00
Edgar Gabriel
8eda9f2ecd common/ompio: fix coverty warnings
this commmit fixes coverty warnings CID 1445198 and CID 1445197
For a reason that is a bit unclear to me, coverty only complained about the read
files, but the write operations had the same issue, so I fixed that within the
same commit as well.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-05-23 13:40:39 -05:00
Kurita, Takehiro
7ece564978 java: Fix compilation error in allToAllw using Java arrays
Java bindings in Open MPI support Java arrays and direct buffers
as buffers. All non-blocking methods must use direct buffers and
only blocking methods can choose between Java arrays and
direct buffers.
Though Comm.allToAllw() is a blocking method, Java applications
using Java arrays as buffers get compilation errors.
This fix enables using Java arrays in Comm.allToAllw().

Signed-off-by: Kurita, Takehiro <fj6370fp@aa.jp.fujitsu.com>
2019-05-22 10:00:16 +09:00
Edgar Gabriel
27b2ec71a7 common/ompio: add support for read operations and collective I/O
external32 data representation is now support by ompio for everything
but non-blocking collective I/O operations. The support can further be improved
in a second step to limit the temporary buffer size (at least for blocking operations),
but it does work now for many scenarios.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-05-20 17:56:16 -05:00
Edgar Gabriel
ab56e6f0db common/ompio: make individual read operations work.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-05-20 17:22:33 -05:00
Edgar Gabriel
f6b3a0af52 common/ompio: individual write of external32 works
both blocking and non-blocking. collective write and read operations not yet.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-05-20 16:26:14 -05:00
Edgar Gabriel
d955753cb8 common/ompio: abstraction for different convertor types
introduce separate convertors for memory vs. file representation. Adjust the interfaces for decode_datatype to provide the convertor to be used for that.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-05-20 13:35:38 -05:00
Edgar Gabriel
35be18b266 common/ompio: rename ompio_cuda* to ompio_buffer*
the infrastructure put in place to manage cuda buffers is actually
a lot more generic than just for cuda buffers. Specifically, we ca
reuse much of the code to implement the external32 data representation.
This commit converts the code from common_ompio_cuda* to
common_ompio_buffer*. There are just very few places where we actually need to keep the OPAL_CUDA_SUPPORT ifdef in place.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-05-20 12:50:04 -05:00
Edgar Gabriel
a96efb7620 common/ompio: add comm_ompio_read_all/write_all functions
in preparation for adding support for the external32 data
representation.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-05-20 12:49:36 -05:00
Valentin Petrov
f19f6f432a Coll/hcoll: don't init opal memhooks unless explicitely requested by user
If user sets HCOLL_EXTERNAL_UCM_EVENTS=1 then we try init opal
    memory framework and register a mem release cb. Otherwise, rely on ucx.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-05-20 11:17:44 +03:00
Yossi Itigin
9d1994b906 OSC/UCX: Fix deadlock with atomic lock
Atomic lock must progress local worker while obtaining the remote lock,
otherwise an active message which actually releases the lock might not
be processed while polling on local memory location.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-05-19 20:10:09 +03:00
Alex Anenkov
77d466edf3 coll/libnbc: add recursive doubling algorithm for MPI_Iallreduce
Signed-off-by: Alex Anenkov <anenkov.ru@gmail.com>
2019-05-19 18:39:11 +07:00
Sergey Oblomov
a3578d9ece PML/UCX: disable PML UCX if MT is requested but not supported
- in case if multithreading requested but not supported
  disable PML UCX

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-17 11:25:23 +03:00
bosilca
6089608858
Merge pull request #6647 from bosilca/fix/length_0
Fix/length 0
2019-05-14 17:59:15 -04:00
Jeff Squyres
9442989e2c
Merge pull request #6382 from jsquyres/pr/ofi-mtl-gitignore
mtl/ofi: add a .gitignore
2019-05-13 12:00:41 -04:00
George Bosilca
42119254c7 Fix incorrect behavior with length == 0
Fixes #6575.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-10 19:53:34 -04:00
George Bosilca
d141bf7912 Update the datatype dump to match the actual types.
Update the comments to better reflect what is going on.
Minor indentations.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-10 18:03:57 -04:00
Nathan Hjelm
4345308dfd osc/rdma: fix CAS 32-bit network atomic compatibility check
When checking for btl compatibility with 32-bit CAS osc/rdma was
checking the incorrect flag field.

Signed-off-by: Nathan Hjelm <hjelmn@cs.unm.edu>
2019-05-10 07:27:53 -06:00
KAWASHIMA Takahiro
dabad084b5
Merge pull request #6621 from bosilca/topic/persistent_req_leak
Fix the leak of fragments for persistent sends (issue #6565)
2019-05-03 15:21:42 +09:00
George Bosilca
a16cf0e4dd
Fix the leak of fragments for persistent sends.
The rdma_frag attached to the send request was not correctly released
upon request completion, leaking until MPI_Finalize. A quick solution
would have been to add RDMA_FRAG_RETURN at different locations on the
send request completion, but it would have unnecessarily made the
sendreq completion path more complex. Instead, I added the length to
the RDMA fragment so that it can be completed during the remote ack.

Be more explicit on the comment.

The rdma_frag can only be freed once when the peer forced a protocol
change (from RDMA GET to send/recv). Otherwise the fragment will be
returned once all data pertaining to it has been trasnferred.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-02 09:40:11 -04:00
Jeff Squyres
ac54d771ec mtl/ofi: add a .gitignore
Ignore generated files.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-05-01 14:00:00 -07:00
Yossi Itigin
5d2200a7d6
Merge pull request #6605 from brminich/topic/shmem_all2all_put
SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h
2019-05-01 12:00:21 +03:00
bosilca
399b7133ab
Merge pull request #6556 from EmmanuelBRELLE/PR_fix_local_handle_in_PUT_message
pml/ob1: fixed local handle sent during PUT control message
2019-04-27 13:51:22 -04:00
Mikhail Brinskii
2ef5bd8b36 SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h
The new routine transfers the data asynchronously from the source PE to all
PEs in the OpenSHMEM job. The routine returns immediately. The source and
target buffers are reusable only after the completion of the routine.
After the data is transferred to the target buffers, the counter object
is updated atomically. The counter object can be read either using atomic
operations such as shmem_atomic_fetch or can use point-to-point synchronization
routines such as shmem_wait_until and shmem_test.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-04-26 14:47:58 +03:00
Mark Allen
d85cac8f1a fixing an unsafe usage of integer disps[] (romio321 gpfs)
There are a couple MPI_Alltoallv calls in ad_gpfs_aggrs.c where the
send/recv data comes from places like req[r].lens, and the send
buffer and send displacements for example were being calculated as
    sbuf = pick one of the reqs: req[bottom].lens
    sdisps[r] = req[r].lens - req[bottom].lens
which might be okay if the .lens was data inside of req[] so they'd
all be close to each other. But each .lens field is just a pointer
that's malloced, so those addresses can be all over the place, so the
integer-sized sdisps[] isn't safe.

I changed it to have a new extra array sbuf and rbuf for those two
Alltoallv calls, and copied the data into the sbuf from the same
locations it used to be setting up the sdisps[] at, and after the
Alltoallv I copy the data out of the new rbuf into the same
locations it used to be setting up the rdisps[] at.

For what it's worth I was able to get this to fail -np 2 on a GPFS
filesystem with hints romio_cb_write enable. I didn't whittle the
test down to something small, but it was failing in an
MPI_File_write_all call.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2019-04-23 16:01:55 -04:00
Jeff Squyres
9a9d106296
Merge pull request #6555 from EmmanuelBRELLE/PR-pmlob1_fix_rc_for_putfrag_when_get_failed
pml/ob1: fixed exit from get_frag_fail when falling back on btl_put
2019-04-22 17:19:12 -04:00
Gilles Gouaillardet
251477c518
Merge pull request #6431 from ggouaillardet/topic/mpiext_nolib
mpiext/shortfloat: do not create empty libraries
2019-04-22 11:23:19 +09:00
Edgar Gabriel
c80a842036
Merge pull request #6602 from edgargabriel/topic/io_array_refactor
common/ompio: refactor the build_io_array function
2019-04-18 13:44:48 -05:00
Gilles Gouaillardet
e1098dae4b mpiext/shortfloat: do not build an empty library
the shortfloat extension is only made of header files,
and hence do not require a library to be built.

Refs. open-mpi/ompi#6205

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-04-18 13:42:18 -04:00
Gilles Gouaillardet
e70780b762 configury: allow mpi extensions with no libraries
Do not require an archive when the OMPI_MPIEXT_<ext>_HAVE_OBJECT
macro is defined to 0.
See `ompi/mpiext/example/configure.m4`.

Allow some extensions to be built on OS X since the creation of
archives with no files is not permitted.

Refs. open-mpi/ompi#6205

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-04-18 13:42:01 -04:00
Gilles Gouaillardet
232055fc7a fortran/use-mpi-f08: fix intent of the internal ompi_*_f bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-04-18 13:29:19 +09:00
Edgar Gabriel
d43427fc76 common/ompio: refactor the build_io_array function
abstract out the io_array structure to be used in common_ompio_build_io_array function.
This is preparation for a future component that would like to use the same function,
but not modify the io_array stored on the file handle itself.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-04-17 14:42:33 -05:00
Valentin Petrov
30970bdfdf OSC/UCX: correctly handle NULL origin addr and MPI_NO_OP
Addtional bugfix: origin_addr -> result_addr for no_op, replace_op
    and sum_op fetch destination.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-04-17 10:30:21 +03:00
bosilca
8cf7a7e87d
Merge pull request #6538 from bosilca/topic/issue6522
Prevent a segfault when accessing a rank outside a communicator.
2019-04-09 18:08:49 -04:00